This dataset consists of RGB-D videos in indoor scenes, and has dense, per-frame segmentation labels for both object and material categories. Moreover, audio recordings are available for when various objects in the scene were struck. Finally, the approximate positions of where these objects were struck are also labelled