Description
DuReader is a large-scale real-world Chinese dataset for Machine Reading Comprehension (MRC) and Question Answering (QA). All questions in the dataset are sampled from real anonymized user queries. The evidence documents, from which answers are derived, are extracted from the web and Baidu Zhidao using Baidu search engine. The answers to the questions are human generated. DuReader version 2.0 contains more than 300K questions, 1.4M evidence documents and 660K human generated answers. It can be used to train or evaluate MRC models and systems.
Data Statistics
- | question | document | answer |
---|---|---|---|
amount | 301574 | 1431429 | 665723 |
avg len | 26(char) | 1793(char) | 299(char) |