This dataset had been originally shot for theICPR 2012human activities recognition and localization challenge (ICPR HARL). Target applications are the recognition of complex human activities, focusing on complex human behavior involving several people in the video at the same time, on actions involving several interacting people and on human-object interactions.The dataset has been shot with two different cameras:a moving camera mounted on a mobile robot delivering grayscale videos in VGA resolution and depth images from a consumer depth camera (Primesense/MS Kinect)a consumer camcorder delivering color videos in DVD resolution.The dataset comes with full annotation and several tools and consists of the following parts:Video data (rgb, grayscale and depth frames)XML annotations of the activity bounding boxesCalibration data and software for the MS Kinect depth sensor used during acquisitionSoftware for browsing videos and annotations (superimposed rectangles) and also for the creation of new annotationsSoftware for evaluating new recognition software : automatic creation of precision/recall curves and integration of localization information into the evaluation processThe dataset is centered on the following action classes:Discussion of two or several peopleA person gives an item to a second personAn item is picked up or put downA person enters or leaves an roomA person tries to enter a room unsuccessfullyA person unlocks a room and then enters itA person leaves baggage unattended (drop and leave)Handshaking of two peopleA person types on a keyboardA person talks on a telephoneFor further details relating to the datasets, evaluation etc.