At the end of 2019, a little-known viral disease emerged in Asia, generating, as its main characteristic, the so-called Severe Acute Respiratory Syndrome (SARS), which directly affects the host's respiratory system. As a result, the transmission of this disease occurs mainly through the air, by droplets from coughs or sneezes of already infected people.
Despite the common manifestation of SARS, symptoms vary from person to person, with asymptomatic, mild or severe symptomatic cases, which may require intubation or even admission to the Intensive Care Unit (ICU).
This virus, called COVID-19, is from the family of coronaviruses, distant cousins of the "influenza" viruses, known by all to cause the common flu and the cold. However, the gradual mutations in COVID-19 ended up generating a much more aggressive and contagious disease than its cousins of the influenza genus, causing the current global pandemic. It is already a fact that many studies are being done around the world, as I have already analyzed on my blog, however, not much is known about COVID-19.
While a vaccine is not made, the most effective way to control the spread is to stay at home, in social distance, and to observe the health recommendations of Organs competent bodies as much as possible.
As a practice of the Data Science course, in Practice, and aiming to contribute to the understanding of the behavior and consequences of the disease, I focus, in the midst of social distance, to carry out a series of Data Science and Statistics studies related to this health problem.
That said, in this publication, using public data on the disease, I analyzed the current situation of COVID-19 in Brazil, which led to the emergence of several insights, presented throughout the text.
The complete project can be seen in the notebook.
The data in question were extracted on June 17, 2020, from Our World in Data, a database that gathers information from 207 countries with about 30 variables.
In this project, after performing the necessary cleanups and conversions, I selected only some layers of the dataset that I considered able to answer some important questions. Such data were:
date - Date of observation
total_case - Total confirmed cases of COVID-19
location - Geographic location
total_deaths - Total deaths attributed to COVID-19
gdp_per_capita - Gross domestic product at purchasing power parity, given in dollars.
Which can be seen, in the first five data below:
With this data organized, I asked some important questions to understand the current situation in Brazil:
Is Brazil among the countries with the highest numbers?
Is there any relationship between PBI and deaths?
When was the first one in Brazil registered?
When was the first death in Brazil recorded?
How is the situation of cases in Brazil?
To answer the first question, I created two filters, selecting the five countries with the most cases, deaths and the last numbers recorded since the last update of the database.
#Filter latest dates
covid_dts = covid_clear.loc[covid_clear.date == '2020-06-14'].sort_values(by="total_cases", ascending=False)
#Separate the five countries with the most cases and deaths
cinco_max = covid_death.head()
cinco_max_cs = cinco_max.reset_index().sort_values(by = ["total_deaths"])
cinco_max_df = cinco_max.reset_index().sort_values(by = ["total_cases"])
By creating these filters, it was then possible to generate the graph of the five countries with the most cases registered in the world, and also the countries with the most fatalities, as we can see below:
Soon, I noticed that, unfortunately, Brazil is in a complicated situation, ranking second in the rankings of countries with more cases and also more deaths, behind only the United States in both cases.
Having overcome this question, I asked myself: "Is there any relationship between the country's purchasing power and the number of deaths?". To answer this question, it was necessary to normalize the data, transporting them to a single scale:
Thus, it was possible to calculate the correlation:
It can be seen that the answer to question number 2 is affirmative. In fact, there seems to be a strong correlation between purchasing power in the countries analyzed and the number of deaths from COVID-19, so that countries with higher incomes per head have a lower number of deaths, and vice versa, which, however, it cannot be taken as a causal relationship without further studies.
As best shown in the graph below, the correlation between low purchasing power and the number of deaths is strong and negative (around -0.91), showing a third-order pattern.
fig, ax = plt.subplots(figsize = (8,4))
plt.title("Disperssão entre o número de mortos e o PIB per capta")
sns.regplot('total_deaths','gdp_per_capita',data = covid_normali, line_kws = {"color":"#ff304f"}, order=3)
plt.xlabel("Número de mortos")
plt.ylabel("PIB per capta")
plt.show()
After analyzing this information, I noticed that I had not yet analyzed the data only from Brazil, so, to answer questions 3 and 4, I performed some filters:
#First case and death filters
primeira_morte = covid_brasil[primeira_morte].sort_index().head(1)
primeiro_caso = covid_brasil[primeira_caso].sort_index().head(1)
With that, I discovered that the first case in Brazil was registered on February 26th and the first death occurred on March 18th, and only in this 21-day window between the first case and the first death, about than 290 new cases in the country.
Last, but not least, I made the general plot of all the cases registered so far.
Thus, answering the last question asked, I found that, so far (June 2020), the curve of cases in Brazil is still ascending, showing no signs of having reached its peak. However, to confirm what this hypothetical peak would be, it would be necessary to create a predictive model, which is not within the scope of this project.
As it was possible to observe, the situation in Brazil in the coronavirus crisis is quite adverse, appearing on the international scene as the second country with the most infections and deaths. In addition to the high rate of contagion since the beginning of the pandemic (as we analyzed, just 21 days after the first case recorded there were already almost 300 people sick), the contagion curve is still upward and shows no sign of cooling down.
In an effort to shed light on the reasons that led to this terrifying picture, we note that the low purchasing power of the Brazilian population may have played a relevant role in the increase in the total number of deaths, which, obviously, does not exclude other causes not analyzed in the present study.
Comments