MACHINE LEARNING DATASETS

High-quality training datasets, made available in one place.
A place to easily discover and discuss open datasets.

We are adding new datasets daily! To contribute click here.

POPULAR DATASETS

Open Source Biometric Recogni…

A communal biometrics framework supporting the development of open algorithms and reproducible evaluations. OpenBR is a framework for investigating new m…

age estimation, biometric, face detection, gender estimation

Google Audioset

AudioSet consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube…

google, music, speech, vehicle

Uber 2B trip data

Uber Movement provides anonymized data from over two billion trips to help urban planning around the world. You need to sign up to download this data.

trips, uber, urban planning

MNIST handwritten digits

MNIST: handwritten digits: The most commonly used sanity check. Dataset of 25x25, centered, B&W handwritten digits. It is an easy taskjust because somethi…

natural-image

WikiText

The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikiped…

language modeling, wiki

Netflix Prize

Netflix released an anonymized version of their movie rating dataset; it consists of 100 million ratings, done by 480,000 users who have rated between 1 a…

movie, ranking

Searchable Machine Learning Datasets

It is hard to find the relevant datasets for a machine learning problem you are working on. We believe researchers should focus on improving the models and innovating in AI. We are here to help you with the time consuming peice of finding datasets.

image

1000+ Datasets

Launching with 1000+ datasets across multiple fields with a goal to continously increase this number.

Collaborate

A community for AI researchers & developers to learn and share​ ​their ​knowledge of datasets and models.

Better Search

We are focussed on increasing the searchability of datasets, both by better underlying metadata and better UI.

Marketplace

Coming Soon! We are exploring an option to create a dataset marketplace to further simply the process of acquiring datasets.

If you have any feedback or any features you will like us build on, please send an email to mldta@hcode.tech

NEWEST DATASETS

Total Text Dataset

In order to facilitate a new text detection research, we introduce the Total-Text dataset (ICDAR-17 paper), which is more comprehensive than the existing …

Dark Image Dataset

In order to facilitate a new object detection and image enhancement research, we introduce the Exclusively Dark (ExDark) dataset (CVIU2019). The Exclusive…

UNIMIB2016 Food Database

This database can be used for food recognition and segmentation. The database is composed of 1,027 tray images with multiple foods and containing 73 food …

Food recognition, food segmentation

RawFooT DB: Raw Food Texture …

The Raw Food Texture database (RawFooT) has been specially designed to investigate the robustness of descriptors and classification methods with respect t…

food, texture

Oxford Audiovisual Segmentati…

This dataset consists of RGB-D videos in indoor scenes, and has dense, per-frame segmentation labels for both object and material categories. Moreover, au…

Audio, Audio-visual, Depth, Materials, Objects, Places, RGB-D, Scene Understanding, Scenes, Semantic Segmentation

Mapillary Vistas

Mapillary Vistas is the currently largest, publicly available street view image dataset. Stats: 25,000 Images | 100 Categories | 60 Instance-wise Cat…

GET STARTED!

Signup to start to collaborating, contributing and participating in the dataset discussions! You can always browse the datasets without signing up.