Text

151 Datasets

Datasets


Maluuba News QA Dataset

120K Q&A; pairs on CNN news articles.

question answering

Google Books Ngrams

Successive words from Google books. Offers a simple method to explore when a word first entered wide usage.

Quora Question Pairs

first dataset release from Quora containing duplicate / semantic similarity labels.

question answering

CMU Q/A Dataset

Manually-generated factoid question/answer pairs with difficulty ratings from Wikipedia articles.

question answering

Maluuba goal-oriented dialogue

Procedural conversational dataset where the dialogue aims at accomplishing a task or taking a decision. Often used to work on chat bots.

question answering

bAbi

Synthetic reading comprehension and question answering datasets from Facebook AI Research (FAIR).

question answering

The Childrens Book Test

Baseline of (Question + context, Answer) pairs extracted from Childrens books available through Project Gutenberg. Useful for question-answering (readin...

question answering

Multidomain sentiment analy...

Multidomain sentiment analysis dataset An older, academic dataset.

sentiment

IMDB

An older, relatively small dataset for binary sentiment classification. Fallen out of favor for benchmarks in the literature in lieu of larger datasets.

sentiment

Stanford Sentiment Treebank

Standard sentiment dataset with fine-grained sentiment annotations at every node of each sentences parse tree.

sentiment