公开数据集

FakeNewsNet 假新闻研究数据收集,假新闻、虚假信息、数据挖掘 This is a repository for an ongoing data collection project for fake news research at ASU. We describe and compare FakeN...NLP,News,Social Science,Social Networks Classification
72.61M 1619
Strongbad邮件 Business,NLP,Text Data Classification
0.11M 373
科学流行评论删除 Business,NLP,Text Data,Binary Classification,Bigquery Classification
74.17M 352
Medium Articles 包含标记为AI、机器学习、数据科学或人工智能的帖子,以及用户信息 Medium taps into the brains of the world’s most insightful writers, thinkers, and storytellers to bring you the smartes...NLP,Text Data,Literature Classification
1.8G 542
实体提取从Pitchfork评论 Business,Arts and Entertainment,Music,Retail and Shopping,NLP,Popular Culture Classification
14.49M 962
圣诞节的食谱 Religion and Belief Systems,NLP,Cooking and Recipes,Holidays and Cultural Events Classification
2.51M 875
数以千计的关于爱情的问题,该数据集包含来自QA服务的爱情类问题和答案 ContextRUSSIAN LANGUAGEThis dataset collected from real answers to questions of the mail.ru service: https://otvet.mail....NLP,Education,Text Data,Languages Classification
176.23M 340
普莱诺斯总督 埃里总统2018年 NLP,Brazil Classification
16.5M 835
ACL论文选集,论文数据来自ACL选集 The Accepted paper's data from ACL Anthology. An abstract of a paper is extracted from arXiv if it exists.The data i...NLP,Education,Literature Classification
1.14M 359
电子邮件垃圾邮件 ContextSome emails from [Spam Assassin][1] to create models that can differentiate between spam and ham (non - spam) ema...NLP,Classification,Software,Email and Messaging Classification
12.08M 420
curationCorpus 策展语料库 策展语料库汇集了 40,000 篇专业撰写的新闻文章摘要,并附有文章本身的链接。这个存储库提供了一个抓取工具来访问它们。如果您对...NLP Text
123.13M 616
MJSynth Synthetic Word Dataset 合成词数据集 This is synthetically generated dataset which we found sufficient for training text recognitionon real-world imagesThis...NLP Classification
9.95G 1955
ICDAR 2013 数据集 1 150 images written in Greek and English language as well as 50 images written inIndian Bangla language.2 BlackWhite ha...NLP Text
172.61M 1855