Datasets are an integral part of developing, testing, and running machine learning models. It’s known, if domain-specific data is required, creating it will be a time-consuming process. In this matter, public databases can help improve productivity, reducing the need to create them from scratch.
In recent years, numerous organizations have created thousands of public datasets to help the industry move forward. One of the most popular datasets is ImageNet and MNST. Currently, they are available for use in verticals like image classification, facial recognition, weather, object detection, and much more.
Certainly, those datasets may be helpful in developing machine learning models that address problems like heart disease, droughts, diabetes, and poverty. However, it’s necessary to understand their challenges, even ethics. Take for instance facial recognition, cataloging faces of individuals is an invasion of privacy in the public domain.
In the section below, you can find a list of twenty-five public datasets.
Name | Creator | Description | ||
---|---|---|---|---|
AWS | Many | Publicly Hosted | ||
Many | Publicly Hosted | |||
Kaggle | Kaggle | Publicly Hosted | ||
Microsoft | Many | Publicly Hosted | ||
Notre Dame | Univ. of Notre Dame | 3D Face | ||
VisualData.io | VisiualData.io | Computer vision | ||
ACS | US Census | Detailed US demographics data | ||
ApolloScape | Baidu | Autonomous driving | ||
Berkeley DeepDrive | UC Berkeley | Video dataset | ||
Data USA | Deloitte & others | Visualize US issues like jobs, skills.. | ||
Diabetes | UCI | Diabetes patient data | ||
El Nino Dataset | UCI | Oceanographic and meteorological readings | ||
Feret | DoD/NIST | Security and law enforcement | ||
HAR Dataset | UCI | Human activity recognition – sitting, biking, standing… | ||
Heart Disease | UCI | Individual data – age, sex, … | ||
ImageNet | Stanford University | Image database | ||
Movieslens | GroupLens | Movie Ratings | ||
Million Song | Kaggle | Music | ||
Netflix Prize | Netflix | Movie Ratings | ||
Open Images | Images | |||
Overhead Imagery Research Data Set | ORID | Overhead Imagery | ||
SAT-4 Airborne Dataset | ASU | Landscape pictures | ||
Serre Lab | Brow University | Human actions like a smile, laugh, talk, smoke… | ||
SIFT10M Data Set | UCI | The nearest neighbor search algorithm method | ||
SpaceNet | SpaceNet | Precision-labeled. High-resolution satellite images. |