训练_公开数据集帕依提提-人工智能高质量数据集服务平台

预测Reddit社区参与度数据集，GDELT帖子分类以及Sirocco文本分析（意见和实体提取）该数据集包含3个月（2017年6月至8月）的Reddit新闻帖子，以及GDELT帖子分类以及Sirocco文本分析（意见和实体提取）的结果。它用...NLP,Computer Science,Online Communities Classification

174.09M 798

Sergei Sokolenko

Word2vec在维基百科上训练数据(单字母+双字母)，以捕捉unigram和bigram 这是一个单词嵌入模型，创建于维基百科+各种来源的评论。与从基于短语的方法（不考虑相邻词的短语/双词上下文）创建双词不同，这...NLP,Computer Science,Software,Programming,Neural Networks Classification

8.62G 702

aintnosunshine

Facebook 发布的300维预训练，在 Common Crawl 上训练的200万个词向量 300-dimensional pretrained FastText English word vectors released by Facebook.The first line of the file contains the nu...NLP,Arts and Entertainment Classification

650M 722

Manish Maharjan

维基百科Word2Vec，Apache Spark word2vec由200K维基百科页面培训 I used Apache Spark to extract more than 6 million phrases from 200,000 English Wikipedia pages. Here is the process of...NLP,Business,Earth and Nature,Text Mining Classification

132.74M 631

Maziyar

蔬菜（谷歌Word2Sec新闻） Vegetables (Google Word2Vec News)...NLP,News Classification

3.73M 1116

Liling Tan

reddit向量数据集，用于训练 sence2vec模型 Sence2vec word embeddings model works better than word2vec , since it utilises contextual information from words.This re...NLP,Computer Science,Text Data,spaCy Classification

635.76M 954

Poonam Ligade

Dataset Category

公开数据集