The Daimler Urban Segmentation Dataset consists of video sequences recorded in urban traffic. The dataset consists of 5000 rectified stereo image pairs with a resolution of 1024x440. 500 frames (every 10th frame of the sequence) come with pixel-level semantic class annotations into 5 classes: ground, building, vehicle, pedestrian, sky.Dense disparity maps are provided as a reference, however these are not manually annotated but computed using semi-global matching (sgm).Related publications:T. Scharwchter, M. Enzweiler, S. Roth, and U. Franke. "Efficient Multi-Cue Scene Segmentation", In Proc. of the German Conference on Pattern Recognition (GCPR), 2013.[PDF]T. Scharwchter, M. Enzweiler, S. Roth, and U. Franke."Stixmantics: A Medium-Level Model for Real-Time Semantic Scene Understanding", European Conference on Computer Vision (ECCV), 2014[PDF]