Description

The Chars74K dataset consists of 64 classes (0-9, A-Z, a-z), 7705 characters obtained from natural images, 3410 hand drawn characters using a tablet PC, 62992 synthesised characters from computer fonts. This gives a total of over 74K images (which explains the name of the dataset). In the English language, Latin script (excluding accents) and Hindu-Arabic numerals are used. For simplicity we call this the English characters set. T. E. de Campos, B. R. Babu and M. Varma. Character recognition in natural images. In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), Lisbon, Portugal, February 2009. Bibtex | Abstract | PDFCharacter recognition is a classic pattern recognition problem for which researchers have worked since the early days of computer vision. With today's omnipresence of cameras, the applications of automatic character recognition are broader than ever. For Latin script, this is largely considered a solved problem in constrained situations, such as images of scanned documents containing common character fonts and uniform background. However, images obtained with popular cameras and hand held devices still pose a formidable challenge for character recognition. The challenging aspects of this problem are evident in this dataset. The Chars74k dataset consists of:64 classes (0-9, A-Z, a-z)7705 characters obtained from natural images3410 hand drawn characters using a tablet PC62992 synthesised characters from computer fontsThis gives a total of over 74K images (which explains the name of the dataset).

Related Papers

  • 62992 synthesised characters from computer fonts [link]
  • 3410 hand drawn characters using a tablet PC [link]
  • 7705 characters obtained from natural images [link]
  • 64 classes (0-9, A-Z, a-z) [link]

Related datasets