Description

The most important feature of this dataset is its simplicity to use and its being well-documented, which can be widely used in various studies of text analysis regarding Kurdish Sorani news and articles. The documents consist of eight categories, which are Sport, Religion, Art, Economic, Education, Social, Style, and Health. Each of them consisted of 500 text documents, where the total size of the corpus is 4,007 text files. The dataset and documents have become freely accessible in order to have repeatable outcomes for experimental assessment.

Related datasets