YouCook is an Annotated Data Set of Unconstrained Third-Person Cooking Videos and is prepared from 88 open-source YouTube cooking videos. The YouCook dataset contains videos of people cooking various recipes. The videos were downloaded from YouTube and are all in the third-person viewpoint; they represent a significantly more challenging visual problem than existing cooking and kitchen datasets (the background kitchen/scene is different for many and most videos have dynamic camera changes). In addition, frame-by-frame object and action annotations are provided for training data (as well as a number of precomputed low-level features). Finally, each video has a number of human provided natural language descriptions (on average, there are eight different descriptions per video). This dataset has been created to serve as a benchmark in describing complex real-world videos with natural language descriptions.Related publications:P. Das, C. Xu, R. F. Doell, J. J. Corso. A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2013.Chenliang Xu, Pradipto Das, Richard F. Doell, Philip Rosebrough and Jason J. Corso, "YouCook Dataset" (PDF)