公开数据集

Septuagint Earth and Nature,Religion and Belief Systems,NLP,Text Data,Languages Classification
7.39M 356
28种语言中的停止词,自然语言处理中的文本预处理 Stopwords are the words in any language which does not add much meaning to a sentence. They can safely be ignored withou...NLP,Computer Science,Text Data,Languages Classification
0.09M 1005
拼图竞赛数据集,包含翻译成英语的文本 These datasets refer to [jigsaw competition](https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification)T...NLP Classification
664.76M 400
访谈 NLP,Exploratory Data Analysis,Data Cleaning,Feature Engineering,Employment Classification
4.37M 374
英语缩略语列表 NLP,Text Mining Classification
0M 360
仇恨言语罗马乌尔都语(HS RU 20) NLP,Artificial Intelligence Classification
0.49M 811
消费者投诉-金融产品,该数据集包括消费者对金融产品的投诉和文本 This data is a collection of complaints about consumer financial products and services that we sent to companies for res...NLP,Beginner,Text Data,Banking,Text Mining,Lending Classification
243.79M 540
击败鲍比·弗莱:300集的结果 Movies and TV Shows,Food,NLP,Classification,Cooking and Recipes Classification
0.06M 364
泰米尔语歌词数据集 Arts and Entertainment,Computer Science,Music,NLP Classification
26.23M 345
所有专辑的阿姆歌词 Arts and Entertainment,Music,NLP,Text Data,Text Mining,RNN Classification
1.77M 459
印度Subreddit数据 Social Networks,NLP Classification
4.41M 357
媒体文章集2020版 Arts and Entertainment,Computer Science,Education,NLP Classification
1.63M 443
来自wallstreetbets等的Subreddit数据,用于后验量化交易算法的情绪分析 All of the submissions to each of the r/wallstreetbets, r/investing, r/options, and r/SecurityAnalysis subreddits since...NLP,Online Communities,Investing Classification
1.49G 415
IMDB摘要 Arts and Entertainment,Movies and TV Shows,NLP,Text Data Classification
93.03M 358
日语-英语字幕语料库(JESC)[CLEANED],由280万个句子组成的大型语料库 This dataset is cleaned version of JESC by handling misplelled English words and doing word segmentation using:English=...NLP,Business,Computer Science,Languages Classification
220.08M 427
古腾堡 Education,Software,NLP,Text Data Classification
14.25M 324
ELI5记分器训练数据原型816000例,用于创建评分模型 ELI5 means Explain like I am 5 . It's originally a long and free form Question-Answering scraping from reddit eli5 s...NLP,Earth and Nature,Arts and Entertainment,Education,Social Science,Sports,Regression,Transformers Classification
672.61M 403
NERu数据集 NLP,Text Data,LSTM Classification
14.5M 287
海得拉巴Zomato餐厅 NLP,Ratings and Reviews,Cooking and Recipes,spaCy Classification
3.44M 916
泰米尔二进制分类1K tweets标签V1 NLP,Classification Classification
0.38M 337