TBAD The 早餐准备相关的10个动作数据集
Data Type：2D Box,Image Caption
Data Preview ? 3.6G
Data Structure ?
A common problem in computer vision is the applicability of the algorithms developed on the meticulously controlled datasets on real world problems, such as unscripted, uncontrolled videos with natural lighting, view points and environments. With the advancements in the feature descriptors and generative methods in action recognition, a need for comprehensive datasets that reflect the variability of real world recognition scenarios has emerged.
This dataset comprises of 10 actions related to breakfast preparation, performed by 52 different individuals in 18 different kitchens. The dataset is to-date one of the largest fully annotated datasets available. One of the main motivations for the proposed recording setup “in the wild” as opposed to a single controlled lab environment is for the dataset to more closely reflect real-world conditions as it pertains to the monitoring and analysis of daily activities.
The number of cameras used varied from location to location (n = 3 − 5). The cameras were uncalibrated and the position of the cameras changes based on the location. Overall we recorded ∼77 hours of video (> 4 million frames). The cameras used were webcams, standard industry cameras (Prosilica GE680C) as well as a stereo camera (BumbleBee , Pointgrey, Inc). To balance out viewpoints, we also mirrored videos recorded from laterally-positioned cameras. To reduce the overall amount of data, all videos were down-sampled to a resolution of 320×240 pixels with a frame rate of 15 fps.
Cooking activities included the preparation of:
- coffee (n=200)
- orange juice (n=187)
- chocolate milk (n=224)
- tea (n=223)
- bowl of cereals (n=214)
- fried eggs (n=198)
- pancakes (n=173)
- fruit salad (n=185)
- sandwich (n=197)
- scrambled eggs (n=188).
The benchmark and database are described in the following article. We request that authors cite this paper in publications describing work carried out with this system and/or the video database.
H. Kuehne, A. B. Arslan and T. Serre. The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities. CVPR, 2014. PDF Bibtex
H. Kuehne, J. Gall and T. Serre. An end-to-end generative framework for video segmentation and recognition. WACV, 2016. PDF Bibtex Project Website