Description

A large general-purpose language modeling dataset. Often used to train distributed word representations such as word2vec.