Datasets are an integral part of developing, testing, and running machine learning models. It’s known, if domain-specific data is required, creating it will be a time-consuming process. In this matter, public databases can help improve productivity, reducing the need to create them from scratch.
In recent years, numerous organizations have created thousands of public datasets to help the industry move forward. One of the most popular datasets is ImageNet and MNST. Currently, they are available for use in verticals like image classification, facial recognition, weather, object detection, and much more.
Certainly, those datasets may be helpful in developing machine learning models that address problems like heart disease, droughts, diabetes, and poverty. However, it’s necessary to understand their challenges, even ethics. Take for instance facial recognition, cataloging faces of individuals is an invasion of privacy in the public domain.
In the section below, you can find a list of twenty-five public datasets.
|Univ. of Notre Dame
|Detailed US demographics data
|Deloitte & others
|Visualize US issues like jobs, skills..
|Diabetes patient data
|El Nino Dataset
|Oceanographic and meteorological readings
|Security and law enforcement
|Human activity recognition – sitting, biking, standing…
|Individual data – age, sex, …
|Overhead Imagery Research Data Set
|SAT-4 Airborne Dataset
|Human actions like a smile, laugh, talk, smoke…
|SIFT10M Data Set
|The nearest neighbor search algorithm method
|Precision-labeled. High-resolution satellite images.