top of page
  • Writer's pictureJoão Ataide

Analysis of violence in Rio de Janeiro

Updated: Feb 15, 2023


Rio de Janeiro, what a wonderful city! The beautiful hills that surround it, the huge statue of Christ the Redeemer, not to mention the beaches and museums, which are distributed throughout the city. But not everything is flowers, is it?

Due to "n" historical and delicate factors, which I will take into account, the violence in the city has reached alarming levels, as we see in all the newspapers and television programs out there. However, Agency Brazil, an institution of the federal government, indicates a reduction in crime, such as intentional homicides, which reduced by 21% in 2019. The Institution uses data released by the State of Rio de Janeiro, more specifically by the Public Security Institute (ISP), this body is responsible for recording occurrences of the Civil Police stations.

So I decided to explore the Institute's databases, even knowing that these situations are not Black and White, I carried out an exploratory analysis of some variables, to extract some insights, as you will see. The complete project is in my repository, where it contains the notebook used.

The data used here were taken from the State Public Security Institute do Rio de Janeiro, which presents 61 variables and 6,992 recorded data extracted on June 11, 2020, showing the categories of crimes recorded in the cities where they occurred in a time series of this year. 2014.

#input data                                                                        

As we can see in the first five entries below:

However, for this present work, I had to make some filters. First, I indexed the column fmun_cod, which informs the identifier for each municipality in the state. In this way, I only filtered the data for the city of Rio de Janeiro.

#Localize ID do Rio de Janeiro

So I started to understand how the dataset behaves, first making a copy of the initial data. Initially excluding the columns named mes_year, month, region, and fmun, then I performed a filter excluding all 2020 data since, at the present moment of the project, that year was still running. Finally, I indexed the year column and grouped the values for each variable and their respective years.

#Group by year
data_time = data_time.groupby(['ano']).sum()

Consequently, we are left with the following data, having 6 inputs, with 57 variables.

With the dataset organized, I started asking questions about it. Starting with: "What is the average number of vehicle thefts per year?". Vehicle theft is a common practice in capital cities, national statistics indicate that a car is stolen every minute.

So, in its descriptive statistics, it has an average of 19612.50 vehicle thefts per year. In addition, its maximum was 25,894.00 and its minimum was 13,725.00, which are relatively high values.

So when we compare the values of each year, we get the following graph. Which indicates very close values:

Continuing, the questions and the thefts? Thus, when analyzing their values, they were much smaller than the theft, showing a kind of linearity. Such data also have an average of 7,025.66 per year, a maximum of 7,515.00, and a minimum of 6,710.00.

In this way, I was able to calculate the percentage of recovered cars, performing the following equation:

#Percentage of recovered vehicles
rec_vec = (data_time.recuperacao_veiculos)/(data_time.furto_veiculos+data_time.roubo_veiculo)*100

For the percentage of recovered vehicles, I noticed the occurrence of relatively similar values, with an average of 53.71% for each year, with a year that had a maximum of 59.09% and a minimum of 49.28%. In addition, the percentage of vehicles recovered for the entire 6-year interval was 54.29%.

Soon after, I asked the following question, "What is the temporal distribution of robberies?". Robbery is those cases of robbery followed by death, in which there is an important rate of urban violence. Where it had 62 cases on average per year, with a maximum of 93 and a minimum of 34.

Desta forma, os casos de latrocínio na cidade do Rio, tiveram o maior número de casos no ano de 2017, havendo uma queda anos de 2018 e 2019.

I also analyzed cases of robberies from collectives (buses, vans, trains, subways and others), which were common practices of criminals a few years ago. However, data show that this practice is on the rise. With a very high average of 7,266 cases, a maximum of 9,775 and a minimum of 4,412.

It is worth noting that the present research is a preliminary study, which shows the potential of using python, basic descriptive statistics and data, to extract insights and help in decision-making, being such a project a practice of the Data Science course of Practice.

In this way, it was possible to see that the average of vehicle thefts decreased, the thefts remained practically in the same amount and then when we compare with the number of recovered vehicles the total average of all the years is 54%. This means that a little more than 50% of stolen vehicles are returned to their owners.

In addition, I made a quick analysis of the number of robberies that occurred in the time interval, which showed a drastic drop in the years 2018 and 2019. And finally, I analyzed the robberies in collectives, which has maintained a great growth since 2016.


bottom of page