Select Language



MDB-WIKI – 500k+ 张带有年龄和性别标签的人脸图像

MDB-WIKI – 500k+ 张带有年龄和性别标签的人脸图像

334 浏览
3 喜欢
0 次下载
0 条讨论
Person,Image Search,Deep Learning Classification

To the best of our knowledge this is the largest publicly available dataset of face images with gender and age labels fo......

数据结构 ? 280G

    To the best of our knowledge this is the largest publicly available dataset of face images with gender and age labels for training. We provide pretrained models for both age and gender prediction.


    Since the publicly available face image datasets are often of small to medium size, rarely exceeding tens of thousands of images, and often without age information we decided to collect a large dataset of celebrities. For this purpose, we took the list of the most popular 100,000 actors as listed on the IMDb website and (automatically) crawled from their profiles date of birth, name, gender and all images related to that person. Additionally we crawled all profile images from pages of people from Wikipedia with the same meta information. We removed the images without timestamp (the date when the photo was taken). Assuming that the images with single faces are likely to show the actor and that the timestamp and date of birth are correct, we were able to assign to each such image the biological (real) age. Of course, we can not vouch for the accuracy of the assigned age information. Besides wrong timestamps, many images are stills from movies - movies that can have extended production times. In total we obtained 460,723 face images from 20,284 celebrities from IMDb and 62,328 from Wikipedia, thus 523,051 in total.

    As some of the images (especially from IMDb) contain several people we only use the photos where the second strongest face detection is below a threshold. For the network to be equally discriminative for all ages, we equalize the age distribution for training. For more details please the see the paper.


    For both the IMDb and Wikipedia images we provide a separate .mat file which can be loaded with Matlab containing all the meta information. The format is as follows:

    • dob: date of birth (Matlab serial date number)

    • photo_taken: year when the photo was taken

    • full_path: path to file

    • gender: 0 for female and 1 for male, NaN if unknown

    • name: name of the celebrity

    • face_location: location of the face. To crop the face in Matlab run

    • face_score: detector score (the higher the better). Inf implies that no face was found in the image and the face_location then just returns the entire image

    • second_face_score: detector score of the face with the second highest score. This is useful to ignore images with more than one face. second_face_score is NaN if no second face was detected.

    • celeb_names (IMDB only): list of all celebrity names

    • celeb_id (IMDB only): index of celebrity name

    The age of a person can be calculated based on the date of birth and the time when the photo was taken (note that we assume that the photo was taken in the middle of the year):


    • 分享你的想法


    所需积分:65 去赚积分?
    • 334浏览
    • 0下载
    • 3点赞
    • 收藏
    • 分享