Description

The Text and Vision (TVGraz) dataset is an annotated multi-modal dataset which currently contains 10 visual object categories, 4030 images and associated text. The visual appearance of the objects in the dataset is challenging and offers a less biased benchmark. The objective of the multi-modal dataset is to provide a common means for evaluation of object categorization research based on text and vision. The archive "TVGraz_script.tar.gz" contain a python script name "download_TVGRAZ_dataset.py", which will download TVGraz dataset images and text from their respective urls, upon execution and according to the "category_list.txt" file. After downloading the textual data will be in raw format per category per image. Download: TVGraz dataset capturing tool TVGraz: Multi-Modal Learning of Object Categories by Combining Textual and Visual Features (bib) Inayatullah Khan, Amir Saffari, and Horst Bischof In Proc. Workshop of the Austrian Association for Pattern Recognition, 2009

Related datasets