Description
数据量高达472GB,包括了119,197个视频,每个视频时长都为10秒,但是帧率从15~30fps不等,分辨率也从320x240~3840x2160不等。训练视频中有19,197个视频是由大约430名演员真实拍摄的片段,剩余100,000个视频是由真实视频生成的假脸视频。假脸生成使用了DeepFakes, GAN-based以及部分non-learned 方法,使得数据集包含尽可能多的假脸视频。该数据集中的视频包含声音,这是目前绝大部分数据集所不具备的,但是没有针对声音的标注信息。根据官网以及Kaggle竞赛的Leaderboard排名信息,目前SOTA分数loss在0.42左右,还有很大的提升空间,但是计算资源要求很高,根据调研部分人使用了超过8块以上的V100GPU,因此采用该数据集的论文很少。
This competition is closed for submissions. Participants' selected code submissions were re-run by the host on a privately-held test set and the private leaderboard results have been finalized. Late submissions will not be opened, due to an inability to replicate the unique design of this competition.
Training Set
This code competition's training set is not available directly on Kaggle, as its size is prohibitively large to train in Kaggle. Instead, it's strongly recommended that you train offline and load the externally trained model as an external dataset into Kaggle Notebooks to perform inference on the Test Set. Review Getting Started for more detailed information.
The full training set is just over 470 GB. We've made it available as one giant file, as well as 50 smaller files, each ~10 GB in size. You must accept the competition's rules to gain access to any of the links below.
Files
train_sample_videos.zip - a ZIP file containing a sample set of training videos and a
metadata.json
with labels. the full set of training videos is available through the links provided above.sample_submission.csv - a sample submission file in the correct format.
test_videos.zip - a zip file containing a small set of videos to be used as a public validation set.
To understand the datasets available for this competition, review the Getting Started information.
Columns
filename
- the filename of the videolabel
- whether the video is REAL or FAKEoriginal
- in the case that a train set video is FAKE, the original video is listed heresplit
- this is always equal to "train".