top of page
  • Writer's pictureJoão Ataide

Predicting animal adoptions with Prophet

Updated: Feb 15, 2023


One thing that always interested me was environmental preservation, especially our fauna, there were times when I was a vegetarian, and due to some problems that are not the case, today I help in other ways, raising awareness of preservation and feeding and giving affection to those animals of the street.

Thinking about it, I remember my university, where our pedagogue, Hortência Pessoa, with other collaborators managed the Unidos Pelas Patinhas project, such a project removes animals from the streets, especially dogs and cats, vaccinating them and taking care of them until they get someone to adopt them.

Inspired by this, when I saw the practical project of Carlos Melo's School of Data Science, I wasted no time in applying Machine Learning tools, especially Facebook's Prophet framework, applied to prevent animals from entering and leaving institutes like this one.

However, the Unidos Pelas Patinhas initiative does not have a vast organized database, so I will use the Austin Animal Center dataset. This American institution provides public services for the control of stray animals or those that have been lost from their owners, providing water, shelter, and veterinary support for the animals, work very similar to the Brazilian NGO.

The data provided by the institution on April 30, 2020, where contains 117K entries and 12 variables. This data can be downloaded directly from the City of Austin Open Data Portal.

As for the model used, Prophet, as I said, is a framework used in Facebook's problems to deal with time series. That is, it is very strong to deal with problems that have characteristics of data over long periods (months or years), and that have the greatest possible historical detail, with strong and highlighted personalities and with known holidays or special dates and a growth trend, not linear, which approaches a limit.

In this way, we can unite the two and analyze, among the animals that entered the organization, how many were adopted or returned to their owners. But first, let's get to know your variable dictionary:

  • Animal ID - Animal identification number

  • Name - Animal's name

  • Date Time - Animal entry date and time

  • Month Year - Month and year of entry of the animal

  • Date of Birth - The animal's birthday.

  • Outcome Type - Outcome Type

  • Outcome Subtype - Outcome Subtype

  • Animal Type - Species of the animal

  • Sex upon Outcome - Gender of the animal

  • Age Upon Outcome - Age After Outcome. Breed - Breed of the animal.

  • Color - Color of the animal

So we can see the first five entries.

We can see then, that each entry symbolizes an animal that passed the institution, for example, Animal A794011 named Chunck is a 2-year-old male cat and entered on May 8, 2019, being adopted.

In addition to understanding the data, even before starting the Prophet implementation, I performed an exploratory analysis identifying the presence of a lot of null data, as we can see here below:

Also, I made a count of the likely outcome types (Outcome Type ).

In this way, I identified that of the 117,416 to this date, 445 animals were adopted and unfortunately 6.7% were euthanized. So, due to the greater number of adoption cases, I performed the preventions with the model for cases of adopted animals or returned to their owners.

It is possible to point out that these representations can be used in any temporal filters, such as weekly and daily data. However, for better presentation in this article I will only show the weekly data, if you are interested in observing the complete work, the commented notebook presents such analysis.

How is the weekly temporal distribution of the data?

fig,ax=plt.subplots(figsize=   (10,5))adoptions_dd.resample('W').sum().plot(ax=ax)

I could see then that the data show a certain temporal distribution constancy, even without training to separate the parts of the time series, it is already possible to deduce that these data present a certain trend and seasonality, with a small presence of noise.

First, it was necessary to add the dates of the US holidays

#adding US national holiday dates

And then determine which period we will analyze, taking the value of 52, the number of weeks in a year.

#determine the period
val_fut = modelo.make_future_dataframe(periods=52)

With all parameters ready, I made the predictions.

#forecast for the period
forest = modelo.predict(val_fut)

In this way, we can see our forecast series for the weekly data, plotting it.

#Viewing the forecasts
modelo.plot(forest, xlabel="Data", ylabel = "Adoções")

Even though I already had an idea of the components of the model, I performed the separation, such components can be seen below, respectively trend, holidays, weekends and vacations:


The components presented show a high trend, however, with future linearization behavior at the end of 2019, in addition, I could see the annual distribution of holidays, weekends, and vacations, in which they showed that more adoptions occur on holidays, with greater seasonality for weekends and summer vacations, which occur for the hemisphere in July.

To confirm that my model worked well, I used the cross-validation for 365 days as a performance metric, which uses the entire database, as a test and training sample, partitioning and training the found interaction, as in the figure below:

We then obtain the following table, which indicates the adaptation of the data in relation to time, for the first five periods.

from fbprophet.diagnostics import cross_validation
df_cross = cross_validation(modelo, horizon = '365 days')

In addition, it was necessary to calculate performance metrics, as we can see in the first five periods presented below.

from fbprophet.diagnostics import performance_metrics
df_per = performance_metrics(df_cross)

Thus, it was possible to notice that the model presented an excellent adaptation to the data, as can be seen in the MDAPE (Median Absolute Percentage Error) metric chart below:

from fbprophet.plot import plot_cross_validation_metric
figura = plot_cross_validation_metric(df_cross,metric = 'mdape')

This graph above demonstrates the adaptation of the model to the data, indicating an excellent performance, even being an initial test, it effectively demonstrates that predictive models can be applied to any category of the database.

I emphasize that the present project has a didactic purpose, considering that this was a practice of the course. However, such a project clearly showed the efficiency of using predictive techniques, applied to any data, such as animal shelters, see the complete project in the notebook.

Thank you for reading this far, if you want more information about this project, contact me. Did you like the job? If yes, share with friends. :)


bottom of page