Google Audioset

Others Dataset

Homepage

https://research.google.com/audioset/

Description

AudioSet consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos. The ontology is specified as a hierarchical graph of event categories, covering a wide range of human and animal sounds, musical instruments and genres, and common everyday environmental sounds.

By releasing AudioSet, we hope to provide a common, realistic-scale evaluation task for audio event detection, as well as a starting point for a comprehensive vocabulary of sound events.

Discussion

Related datasets

TRANCOS Overlapping Car Crowds

The TRaffic ANd COngestionS (TRANCOS) dataset, a novel benchmark for (extremely overlapping) vehicle counting in traffic congestion situations. It consist…

car, detection, highway, object, spain, traffic, transportation, urban, vehicle

Vision

VoxForge

Clean speech dataset of accented english. Useful for instances in which you expect to need robustness to different accents or intonations.

speech

Audio

2000 HUB5 English

English-only speech data used most recently in the Deep Speech paper from Baidu.

speech

Audio

CHIME

Noisy speech recognition challenge dataset. Dataset contains real simulated and clean voice recordings. Real being actual recordings of 4 speakers in near…

speech

Audio

TED-LIUM

Audio transcription of TED talks. 1495 TED talks audio recordings along with full text transcriptions of those recordings.

speech

Audio

LibriSpeech

Audio books data set of text and speech. Nearly 500 hours of clean speech of various audio books read by multiple speakers, organized by chapters of the b…

speech

Audio

TIMIT

English-only speech recognition dataset.

speech

Audio

MOCAT (TUB Multi-Object and M…

The TU Berlin Multi-Object and Multi-Camera Tracking Dataset (MOCAT) is a synthetic dataset to train and test tracking and detection systems in a virtual …

animal, detection, evaluation, multi-class, multi-view, pedestrian, synthetic, tracking, vehicle

Vision

Comprehensive Cars (CompCars)

The Comprehensive Cars (CompCars) dataset contains data from two scenarios, including images from web-nature and surveillance-nature. The web-nature data …

attribute, car, classification, fine-grained, object, recognition, urban, vehicle

Vision

Google Street View Localizati…

The Google Street View dataset contains 62,058 high quality Google Street View images. The images cover the downtown and neighboring areas of Pittsburgh, …

address, google, gps, localization, manhattan, panorama, pittsburgh, retrieval, sphere, streetview, urban

Vision

Google Audioset

Homepage

Description

Tags

Discussion

Related datasets