Petabyte-scale crawl of the webmost frequently used for learning word embeddings. Available for free from Amazon S3. Can also be useful as a network dataset for its a crawl of the WWW.