The Crepe Dataset provides 6 different types of structured activity videos in 1920x1080 resolution. Each activity is represented as a sequence of different action components. Notable features of this dataset includes: - Structured activities as a sequence of component actions. - Multiple activities running in parallel. - Inclusion of distractors that are not relevant to defined activities. - Every frame is annotated with bounding boxes, agent types, agent occlusion and action labels. We provide the following human-labeled annotations: - Bounding box of every person - Person type (action performer or distractor) - Occlusion against another person - Action label - Activity label