Description
We present a fully data-driven method to compute depth from diverse monocular video sequences that contain large amounts of non-rigid objects, e.g., people. To learn reconstruction cues for non-rigid scenes, we introduce a new dataset (WSVD) consisting of stereo videos scraped from Youtube. This dataset has a wide variety of scene types, and features many nonrigid objects.