Select Language

AI社区

公开数据集

访谈 NLP,Exploratory Data Analysis,Data Cleaning,Feature Engineering,Employment Classification
4.37M 143
拼图竞赛数据集,包含翻译成英语的文本 These datasets refer to [jigsaw competition](https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification)T...NLP Classification
664.76M 148
28种语言中的停止词,自然语言处理中的文本预处理 Stopwords are the words in any language which does not add much meaning to a sentence. They can safely be ignored withou...NLP,Computer Science,Text Data,Languages Classification
0.09M 301
Septuagint Earth and Nature,Religion and Belief Systems,NLP,Text Data,Languages Classification
7.39M 141
俄语成语 Education,NLP,Russia Classification
0.06M 144
总统辩论视频评论 Politics,NLP,Exploratory Data Analysis Classification
6.52M 186
covid19 西班牙语 es py tweets 早 2020年4月底 Earth and Nature,Health,Social Networks,Coronavirus,NLP,Text Data Classification
805.29M 287
电子邮件文本分类 If you are working, then you are bound to face the problem of reading all the emails that are cluttered in your inbox. S...NLP,Business,Classification,Arts and Entertainment,News,Text Data Classification
18.22M 182
波斯语 NLP,Text Data,Text Mining Classification
0M 153
七个名字 Religion and Belief Systems,NLP Classification
0.15M 142
越南健康新闻 Health,News,NLP Classification
16.89M 151
COVID 19印尼推特,与“新冠肺炎”和“政府”相关的印尼推文 ContentThis dataset contains Indonesian Tweets of users who have applied the following keywords: Corona and Pemerintah o...NLP,Deep Learning,Coronavirus,Social Networks,Email and Messaging,Government Classification
31.14M 157
Youtube数据集包含43471个频道、325292个视频和1264035条评论 ContextA portion of data grabbed from Youtube ContentDataset contains youtube channels-videos-comments AcknowledgementsD...NLP,Online Communities,Social Networks Classification
629.07M 227
海绵宝宝成绩单 Arts and Entertainment,NLP Classification
4.85M 149
名称实体识别数据集 The label annotation mistakes by human annotators brings up two challenges to NER:mistakes in the test set can interfere...NLP Classification
5.64M 163
罗伯特·弗罗斯特系列 Arts and Entertainment,Education,NLP,Literature,Text Data,Transformers Classification
0.22M 295
BERT英语无冠词双冠词,BERT英语无上限训练数据的双谱图频率 Is BERT the right model to fine tune your data on? Or do you need to pretrain from scratch?Know your model's trainin...NLP,Music Classification
1.99G 163
阿拉伯文圣训九册 NLP,Multiclass Classification,Clustering Classification
94.48M 187
客户服务中的关系策略,来自四个来源的旅行相关客户服务数据集 Relational Strategies in Customer Service (RSiCS) DatasetHuman-computer data from three live customer service Intelligen...NLP,Business,Text Data Classification
57.78M 167
Virgool数据集,这是一套从virgool.io收集的波斯文章数据 This could be a nice tool for Persian writers or bloggers to automatically pick the suggested hashtag or even subject fo...NLP,Education,Software,Literature Classification
58.89M 210