Select Language





629 浏览
2 喜欢
0 次下载
0 条讨论
Data Cleaning 2D Box


数据结构 ? 57.9G



    The challenging and realistic setup of the ‘WILDTRACK‘ dataset brings multi-camera detection and tracking methods into the wild.

    It meets the need of the deep learning methods for a large-scale multi-camera dataset of walking pedestrians, where the cameras’ fields of view in large part overlap. Being acquired by current high tech hardware it provides HD resolution data. Further, its high precision joint calibration and synchronization shall allow for development of new algorithms that go beyond what is possible with currently available data-sets.

    The data acquisition took place in front of the main building of ETH Zurich, Switzerland, during nice weather conditions. The sequences are of resolution 1920×1080 pixels, shot at 60 frames per second.

    Description of available files

    Synchronized frames extracted with a frame rate of 10 fps, 1920×1080 resolution, and which are post-processed to remove the distortion; Calibration files which use the Pinhole camera model, compatible with the projection functions provided in the OpenCV library. Both the extrinsic and the intrinsic calibrations are available; The ground-truth annotations in a ‘json’ file format (please see separate section bellow); For ease in usage for methods focusing on classification, we also provide a file we refer to as ‘positions’ file in ‘json’ file format. For details please refer to the section bellow. Please check for an update of this site, which shell extend the download list with:

    Full videos;

    Corresponding points annotations which may be used for camera calibration algorithms; A second part of this dataset which albeit not being annotated, can be used for unsupervised methods.

    Positions file

    The ‘positions file’ allows for omitting the work with calibration files and focusing for instance on classification, while making use of the fact that the cameras are static. It consists of information about where exactly a given set of particular volumes of space project to in all of the views. The height of each volume space corresponds to the one of an average person’s height.

    We discretize the ground surface as a regular grid. The 3D space occupied if a person is standing at a particular position is modelled by a cylinder positioned centrally on the grid point. Each cylinder projects into each of the separate 2D views as a rectangle whose position in the view is given in pixel coordinates.

    Using a 480×1440 grid – totalling into 691200 positions – and the provided camera calibration files, we yield such file which is available for download. Each position is assigned an ID using 0-based enumeration ([0, 691199]). The views’ ordering numbers in this file also follow such enumeration, i.e. they range between 0 and 6 inclusively. The positions which are not visible in a given view are assigned coordinates of -1.


    Full ground truth annotations are provided for 400 frames using a frame rate of 2fps. On average, there are 20 persons on each frame. Thus, our dataset provides approximately 400x20x7=56,000 single-view bounding boxes. By interpolating, the annotations’ size can be further increased. This annotations were generated through workers hired on Amazon Mechanical Turk.

    Note that the annotations roughly correspond to the coordinates of the above-elaborated position file and thus include the ID of the annotated position which is estimated to be occupied by the specific target. These position IDs are in accordance with the provided positions file.


    This work was supported by the Swiss National Science Foundation, under the grant CRSII2-147693 ”WILDTRACK”.


    WILDTRACK: A Multi-camera HD Dataset for Dense Unscripted Pedestrian Detection T. Chavdarova; P. Baqué; A. Maksai; S. Bouquet; C. Jose et al. Computer Vision and Pattern Recognition, 2018, 10.1109/CVPR.2018.00528.

    License: No license specified, the work may be protected by copyright.


    title= {The WILDTRACK Seven-Camera HD Dataset},
    keywords= {},
    author= {},
    abstract= {The challenging and realistic setup of the ‘WILDTRACK‘ dataset brings multi-camera detection and tracking methods into the wild.
    It meets the need of the deep learning methods for a large-scale multi-camera dataset of walking pedestrians, where the cameras’ fields of view in large part overlap. Being acquired by current high tech hardware it provides HD resolution data. Further, its high precision joint calibration and synchronization shall allow for development of new algorithms that go beyond what is possible with currently available data-sets.
    The data acquisition took place in front of the main building of ETH Zurich, Switzerland, during nice weather conditions. The sequences are of resolution 1920×1080 pixels, shot at 60 frames per second.
    ## Description of available files
    Synchronized frames extracted with a frame rate of 10 fps, 1920×1080 resolution, and which are post-processed to remove the distortion;
    Calibration files which use the Pinhole camera model, compatible with the projection functions provided in the OpenCV library. Both the extrinsic and the intrinsic calibrations are available;
    The ground-truth annotations in a ‘json’ file format (please see separate section bellow);
    For ease in usage for methods focusing on classification, we also provide a file we refer to as ‘positions’ file in ‘json’ file format. For details please refer to the section bellow.
    Please check for an update of this site, which shell extend the download list with:
    ## Full videos;
    Corresponding points annotations which may be used for camera calibration algorithms;
    A second part of this dataset which albeit not being annotated, can be used for unsupervised methods.
    ## Positions file
    The ‘positions file’ allows for omitting the work with calibration files and focusing for instance on classification, while making use of the fact that the cameras are static. It consists of information about where exactly a given set of particular volumes of space project to in all of the views. The height of each volume space corresponds to the one of an average person’s height.
    We discretize the ground surface as a regular grid. The 3D space occupied if a person is standing at a particular position is modelled by a cylinder positioned centrally on the grid point. Each cylinder projects into each of the separate 2D views as a rectangle whose position in the view is given in pixel coordinates.
    Using a 480×1440 grid – totalling into 691200 positions – and the provided camera calibration files, we yield such file which is available for download. Each position is assigned an ID using 0-based enumeration ([0, 691199]). The views’ ordering numbers in this file also follow such enumeration, i.e. they range between 0 and 6 inclusively. The positions which are not visible in a given view are assigned coordinates of -1.
    ## Annotations
    Full ground truth annotations are provided for 400 frames using a frame rate of 2fps. On average, there are 20 persons on each frame. Thus, our dataset provides approximately 400x20x7=56,000 single-view bounding boxes. By interpolating, the annotations’ size can be further increased. This annotations were generated through workers hired on Amazon Mechanical Turk.
    Note that the annotations roughly correspond to the coordinates of the above-elaborated position file and thus include the ID of the annotated position which is estimated to be occupied by the specific target. These position IDs are in accordance with the provided positions file.
    ## Acknowledgment
    This work was supported by the Swiss National Science Foundation, under the grant CRSII2-147693 ”WILDTRACK”.
    ## Publication
    WILDTRACK: A Multi-camera HD Dataset for Dense Unscripted Pedestrian Detection
    T. Chavdarova; P. Baqué; A. Maksai; S. Bouquet; C. Jose et al.
    Computer Vision and Pattern Recognition, 2018, 10.1109/CVPR.2018.00528.
    terms= {},
    license= {},
    superseded= {},
    url= {}

    • 分享你的想法


    所需积分:35 去赚积分?
    • 629浏览
    • 0下载
    • 2点赞
    • 收藏
    • 分享