Select Language

AI社区

公开数据集

谷歌Word2Vec模型,包括 300 万个单词和短语的词汇表的单词向量 It’s 1.5GB! It includes word vectors for a vocabulary of 3 million words and phrases that they trained on roughly 100 b...Computer Science,Programming Classification
3.64G 214
维基百科文章数据集 wikipedia fr 2008 dump of wikipedia...NLP Classification
2.12G 173
情绪相关文本数据集 情绪相关文本数据集...Movies and TV Shows Classification
11.3M 209
MNIST 类似字母的数据集(A-Z) Consist 28x28 handwritten Alphabet imagesContentThere are total 785 columns, each row consists an image of alphabets. Th...NLP,MNIST,CNN Classification
665.89M 259
Hubber模型,各行业文本数据 Hubber模型,各行业文本数据...NLP,MNIST Classification
473.41M 192
134.5M 263
带有偏差数据集的毒性清理版本 cleaned tox bias cleaned up version of toxicity with bias data set...NLP,Data Cleaning,Health Classification
535.39M 317
俄罗斯电报聊天记录,公开俄罗斯电报聊天中解析的数据 Russian Telegram chats history Data parsed from must popular public Russian Telegram chats...NLP,Text Data,Russia Classification
11.08G 212
自然语言处理中的情感分析 #数据集此数据集由NowYSM在Database:Open Database,Contents:Database Contents#Contents下创建。它包含以下文件:...NLP,Arts and Entertainment Classification
2.52M 202
手写数学符号数据集,超过10万个图像样本 Dataset consists of jpg files(45x45)DISCLAIMER: dataset does not contain Hebrew alphabet at all. It includes basic Greek...NLP,Computer Science,Law,Email and Messaging Classification
410.19M 363
NLP 数据 # DatasetThis dataset was created by AbiyuGReleased under CC BY-NC-SA 4.0# ContentsIt contains the following files:...NLP,Psychology Classification
3.14M 193
机器人先生中的单词事件,了解F-Society最喜欢的行话 Mr. Robot is all about data whether it's corrupting it, encrypting it, or deleting it. I wanted to dig up some data...Arts and Entertainment,Games Classification
0.31M 183
星际迷航脚本,所有《星际迷航》系列脚本的原始文本脚本和处理行 Star Trek Scripts TextData scraped from data from http://www.chakoteya.net/StarTrek/index.htmlCode here: https://github....NLP,Movies and TV Shows,Text Data,Text Mining Classification
42.63M 174
中国机器翻译研讨会,语料数据集 # DatasetThis dataset was created by Liling TanReleased under Other (specified in description)# ContentsIt contains the...Deep Learning,Computer Science Classification
6.6G 215
DBpedia语义网应用范例,提供42782篇维基百科文章提供了分类、分层类别 DBpedia (from DB for database) is a project aiming to extract structured content from the information created in Wikiped...Education,Text Data,Multiclass Classification,Text Mining Classification
443.28M 172
越南语地址手写数据集 越南语地址手写数据集...MNIST Classification
423.81M 154
Mac Morpho,带有词性标签的巴西葡萄牙语新闻文本 The canonical metadata on NLTK:packageid=mac_morphoname=MAC-MORPHO:BrazilianPortuguesenewstextwithpart-of-speechtagswebp...Earth and Nature Classification
10.43M 352
NPS聊天,NPS 聊天语料库 上下文 NLTK 上的规范元数据:...Computer Science,Online Communities Classification
2.46M 233
斯坦福自然语言推理 (SNLI) 语料库的 Jsonl 格式 这是斯坦福大学自然语言推理(snLI)语料库的1.0版本。如果你使用这个语料库,请引用这篇论文: http://nlp.Stanford.edu/pubs/snli...Languages Classification
483.45M 232
英语词频,⅓ 百万网络上最常见的英语单词 This dataset contains the counts of the 333,333 most commonly-used single words on the English language web, as derived...Languages Classification
4.73M 215