Select Language

AI社区

公开数据集

相关搜索
您是不是在找?
今日排行
本周排行
本月排行
Word2vec在维基百科上训练数据(单字母+双字母),以捕捉unigram和bigram 这是一个单词嵌入模型,创建于维基百科+各种来源的评论。与从基于短语的方法(不考虑相邻词的短语/双词上下文)创建双词不同,这...NLP,Computer Science,Software,Programming,Neural Networks Classification
8.62G 573
来自webmd.com的避孕产品评论 NLP,Healthcare Classification
7.11M 339
Flickr图片数据集,Flickr 图像字幕数据集 The Flickr30k dataset has become a standard benchmark for sentence-based image description. This paper presents Flickr30...NLP,Image Data,Computer Vision Classification
8.2G 606
Facebook 发布的300维预训练,在 Common Crawl 上训练的200万个词向量 300-dimensional pretrained FastText English word vectors released by Facebook.The first line of the file contains the nu...NLP,Arts and Entertainment Classification
650M 594
Trump Tweet.csv NLP,Text Data Classification
0.07M 332
SMS Spam Ham Prediction Business,Earth and Nature,Internet,Economics,NLP Classification
0.48M 352
斯坦福GloVe 200d数据集,转化为word2vec格式数据 Is the Stanford GloVe 200d dataset converted to word2vec format...NLP,Computer Science Classification
661.31M 902
SMILES OCR数据集,包含超过 90 万个 SMILES 格式的单一产品反应 SMILES(简化分子输入行输入系统)是一种用于输入和表示分子和反应的行符号(一种使用可打印字符的印刷方法)。该数据集包含超过...NLP,Chemistry Classification
175M 1176
维基百科Word2Vec,Apache Spark word2vec由200K维基百科页面培训 I used Apache Spark to extract more than 6 million phrases from 200,000 English Wikipedia pages. Here is the process of...NLP,Business,Earth and Nature,Text Mining Classification
132.74M 531
ConceptNet Numberbatch 向量,来自 ConceptNet 的词向量 These are the word vectors released by the Conceptnet project.ConceptNet的本质是一个三元组:...NLP Classification
899.91M 411
蔬菜(谷歌Word2Sec新闻) Vegetables (Google Word2Vec News)...NLP,News Classification
3.73M 952
SComedy Earth and Nature,NLP,Text Data,Text Mining Classification
2.99M 518
reddit向量数据集,用于训练 sence2vec模型 Sence2vec word embeddings model works better than word2vec , since it utilises contextual information from words.This re...NLP,Computer Science,Text Data,spaCy Classification
635.76M 828
Medium Articles 包含标记为AI、机器学习、数据科学或人工智能的帖子,以及用户信息 Medium taps into the brains of the world’s most insightful writers, thinkers, and storytellers to bring you the smartes...NLP,Text Data,Literature Classification
1.8G 512
实体提取从Pitchfork评论 Business,Arts and Entertainment,Music,Retail and Shopping,NLP,Popular Culture Classification
14.49M 927
Stack Overflow 2018 问题数据集 In this dataset, we explore StackOverflow questions and try to use unsupervised algorithms to extract tags, then train c...NLP,Earth and Nature,Computer Science,Multiclass Classification Classification
230.27M 579
ACL论文选集,论文数据来自ACL选集 The Accepted paper's data from ACL Anthology. An abstract of a paper is extracted from arXiv if it exists.The data i...NLP,Education,Literature Classification
1.14M 353
curationCorpus 策展语料库 策展语料库汇集了 40,000 篇专业撰写的新闻文章摘要,并附有文章本身的链接。这个存储库提供了一个抓取工具来访问它们。如果您对...NLP Text
123.13M 602
JHU-CROWD++ A large-scale unconstrained crowd counting dataset.A comprehensive dataset with 4,372 images and 1.51 million annotation...Person 2D Box
2.87G 866
雄性和雌性大鼠海马的促肾上腺皮质激素释放激素(CRH)和糖皮质激素受体(GR)的PCR数据 这些文件包含在雄性和雌性大鼠海马和下丘脑中测量的促肾上腺皮质激素释放激素(CRH)和糖皮质激素受体(GR)的定量实时PCR的所有...Others Classification
0.04M 824