公开数据集

用土耳其语编写的数据,可以训练word2vec或n-gram模型 This data contains each document written in Turkish and contains wiki document id. You can train word2vec or n-gram mode...NLP,Text Data,Text Mining Classification
463.02M 904
来自CNTK女士的ATIS Business,Earth and Nature,NLP Classification
2.35M 988
梦龙歌词中的恶魔 Arts and Entertainment,NLP Classification
0M 495
亚马逊Alexa的评论 Business,NLP,Deep Learning,Beginner,Naive Bayes Classification
0.49M 359
律政司2009 2018年新闻稿 Earth and Nature,Politics,NLP,Crime,Text Data Classification
52.47M 406
231.77M 1018
来自webmd.com的避孕产品评论 NLP,Healthcare Classification
7.11M 362
泰国皈依天主教 Earth and Nature,NLP Classification
37.78M 457
媒体的文章 Earth and Nature,NLP,Literature,Text Data,Data Visualization,Beginner,Text Mining Classification
3.7M 870
新闻中的国语实体,来自当地新闻的国语实体 PRN - person, group of people, believes, etcLOC - locationNORP - Military, police, government, Parties, etcORG - Organiz...NLP,News,Text Data,Text Mining Classification
411K 1077
标记为 ML/DL/AI 的中型文章,文章描述、标题、作者和其他元数据 Medium Articles tagged under ML/DL/AI scraped using Beautifulsoup and seleniumContent1.Tag : Tagged under AI/ML or DL2.N...NLP,Education,Online Communities,Artificial Intelligence Classification
55.49K 1029
Reddit机器人使用NLP来反击负面评论 Computer Science,Programming,NLP Classification
0M 409
基于Reddit评论的单词表示法的全局矢量数据集 GloVe Reddit Comments Global Vectors for Word Representation based on Reddit comments...NLP Classification
19.1G 632
Facebook 发布的300维预训练,在 Common Crawl 上训练的200万个词向量 300-dimensional pretrained FastText English word vectors released by Facebook.The first line of the file contains the nu...NLP,Arts and Entertainment Classification
650M 698
英国癌症的文本挖掘和分析,英国癌症的自然语言处理 Text mining and analysis on Cancer UK Natural language processing on cancer UK...NLP,Biology,Text Data,Health Conditions Classification
4.33M 391
6.32G 361
Trump Tweet.csv NLP,Text Data Classification
0.07M 361
伯特小一阶 Arts and Entertainment,NLP Classification
837.78M 969
Taptap reviews Games,Video Games,NLP,Deep Learning Classification
3.6M 396
用于NLP的文本数据集 This is a bundle of three text data sets to be used for NLP research.Dialog system technology challenge 7 (DSTC7)UbuntuA...NLP,Earth and Nature,Education Classification
6.49G 1146