Description

We have created the UJIpenchars2 character database by collecting samples from 60 writers at two different sites in two phases: 1st phase, 11 writers, carried out at UJI. 2nd phase, 49 writers, carried out at UPV (44 writers) and UJI (5). Each writer contributed with letters, digits, and other characters and two samples were collected for each pair writer/character. The complete lexicon is as follows:66 letters (33 per case): The 52 ASCII letters. The 14 Spanish non-ASCII letters:Letter n with tilde (2 characters). Vowels with acute accent (10 characters). Letter u with diaeresis (2 characters). The 10 digits. Other 21 characters: The 16 ASCII ones shown in the following line: . , ; : ? ! ' ' ( ) % - @ $ < > 5 non-ASCII ones: Inverted question and exclamation marks (2 characters). Masculine and feminine ordinal indicators (2 characters). The euro sign (1 character). So the total number of samples in this database is 11640: 60 writers x (66+10+21) characters x 2 repetitions UJIpenchars is a subset of UJIpenchars2 with only 1364 samples: the ASCII letters and digits collected at UJI during the 1st acquisition phase. We have not defined a standard task for UJIpenchars2, but divided the writer set into two disjoint subsets in order to ease the definition of writer independent tasks:40 'trn' writers: The 11 1st phase UJI writers. 29 UPV writers.20 'tst' writers: The 5 2nd phase UJI writers. 15 UPV writers.The distribution of our database consists of 2 files: This 'uji2.names'. The file 'ujipenchars2.txt' containing all the samples in a format described later.The handwriting samples were collected on a Toshiba Portg M400 Tablet PC using its cordless stylus. Each one of the 60 writers completed 2 non-consecutive sessions. In each session, the corresponding writer was asked to write one exemplar for each character in the lexicon. The acquisition program shows a set of boxes on the screen, one for each required character, and writers are told to write only inside those boxes. Each acquisition box is approximately 13.6 millimetres wide and 20.4 millimetres tall and contais two horizontal guides at approximate distances of 7.5 and 12.7 millimetres from top, respectively. Writers were instructed to clear the content of the corresponding box by using an on-screen button and try again whenever they made a mistake or were unhappy with the writing of any character. Subjects were monitored only when writing their first exemplars and every sample considered OK by its writer was accepted, even if some of its points lay out of the corresponding acquisition box. Only X and Y coordinate information was recorded along the strokes by the acquisition program, without, for instance, pressure level values or timing information. Thus, in multi-stroke samples, no information at all was recorded between strokes. Both coordinates were expressed as integer ink units, with the origin lying at the top left corner of the corresponding acquisition box. X values grow left-to-right and Y values grow downwards. Although we have employed the same acquisition program on identical hardware at UJI and UPV, we have observed that acquisition files seem to show that UPV samples have been collected using acquisition boxes larger than UJI ones. This is due to a different configuration parameter value that, at UPV, makes the acquisition program translate 1 millimetre into 152 ink units, instead of using the standard UJI ratio: 100 ink units per millimetre. If box homogenisation is needed, it can be easily achieved, for instance, by dividing UPV coordinate values by 1.52. We have also observed that runs of consecutive points with identical coordinates were frequently acquired inside strokes; such runs were preserved in this database, so it is up to its users to decide whether to avoid them by an appropriate preprocessing step or not. Although it is a paper mainly devoted to UJIpenchars, D. Llorens et al.: 'The UJIpenchars Database: A Pen-Based Database of Isolated Handwritten Characters' Proc. of the 6th International Conference on Language Resources and Evaluation. 2008. contains useful information about UJIpenchars2. It can be found in [Web Link].

Related datasets