top of page
  • Writer's pictureJoão Ataide

Free data sources

Updated: Feb 15, 2023


One of the biggest doubts of people who are starting in data analysis in general is where I can get the data I need to carry out the work, in general most people use Google as a search source, however, they are unaware that there are some specific search engines for this type of search, such as google's Dataset Search.

Even with such a tool, which makes the job easier, it is always important to know some references from where to get the data depending on the format and type you are looking for, for that I will make here a list of some of the sites that I use the most when I am prospecting information to carry out jobs, whether in remote sensing, geoprocessing or in the creation of machine learning and deep learning models.

An important platform is the USGS Earth Explorer, well known by those who work with geotechnologies, it has a variety of data, from digital terrain models to satellite images.

The INDE was created by the federal government and coordinated by the IBGE and its purpose is to catalog, integrate and harmonize existing geospatial data in all national institutions.

Another important platform is the place where we find the images of our famous Sino-Brazilian satellite CBERS and others, this site designed and managed by the National Institute for Space Research - INPE.

OpenStreetMap (OSM) is a collaborative mapping project that operates worldwide and is freely available and serves as a good data source when it comes to geospatial street data.

Reddit is one of those platforms for discussion forums that had at the beginning of the internet, in this there are several data that can be used for initial studies without much commercial or academic purpose.

UCI is also a collection of databases, domain theories, and data generators used by the machine learning community for the empirical analysis of algorithms and machine learning.

Kaggle is already a long time companion of the staff, the competition platform has several real and fictitious datasets, served as a good source of studies or even data for proofs of concept.

Five Thirty Eight is an American site that focuses on analyzing ‎‎opinion polls,‎‎ politics, economics, sports blogs, and a lot of interesting information.

Yahoo! Finance is a very interesting finance API, such an application has query functionality using a programming language and applied a lot, commercially, in financial analysis.

Like the previous one, Google Finance has the same functionality and purpose, and can be used as another alternative to obtain this information.

Central Bank of Brazil is the main monetary ‎‎‎‎authority of the country and has information from several national institutions, such as the ‎‎Secretaria de Moeda e Crédito‎‎ ‎‎ ‎‎(SUMOC),‎‎ the ‎‎Banco do Brasil ‎‎ (BB), the National ‎‎Treasury‎‎ ‎‎and several important indices for financial analysis.‎

TWB is an ‎‎international financial institution‎‎ that provides loans and grants to governments of low- and middle-income countries to pursue capital projects, having financial data from several countries.

Quandl is a ‎API that makes extensive stock, dividend and division financial information available to 3000 publicly traded companies in the US and around the world.‎

CoinMarketCap is also an API which makes it possible to obtain information from the cryptocurrency market, of all types of currency, from the oldest coins such as Bitcoin and Ether and cheat coins such as Doge and Shiba Inu.

Binance is the largest cryptocurrency trading platform and has an API with several tools that help us work with cryptocurrencies.

Data.Gov is a platform created by the Federal Government that aims to integrate data and public information from various institutions linked to the Federal Government.

DATASUS is the transparency platform of the Unified Health System (SUS), it has a lot of information about the national health system and was used a lot now in the pandemic.

National Institute of Educational Studies and Research Anísio Teixeira is responsible for carrying out the school census and carrying out the Enem, which is extremely important for studies related to education.

IBGE Automatic Recovery System is an IBGE platform that allows the consultation of data in the institution's statistical table database, informing indexes, such as the Broad Consumer Price Index (IPCA), Monthly Service Survey (PMS), National Index of Consumer Prices (INPC) and others.

The IBGE also has its own platform, which sometimes facilitates the research and acquisition of data, especially when they are shapefiles with the boundaries of states, cities and census sectors.

The Superior Electoral Court also has its own platform that provides data on elections since 1933, informing candidates, number of votes per section and results information.

Embrapa's Strategic Intelligence System is a platform that has a lot of information about the country's agricultural sector with agricultural production, livestock, animal slaughter and others.

Mendeley is a platform used by several researchers from numerous institutions around the world, such that they provide data from their research, reaching a total of 29 million datasets.

As the last source of data that I found on twitter, the API Base dos Dados which is unifying several databases in the country.

It is important to point out that these are just some of the data sources that I use the most in my work and I know that many of my colleagues do too, in addition, it is always important to know how to scrapy, in case you need to perform information mining on some other sites or even contacting whoever manages the data, sometimes an email always helps.


bottom of page