公开数据集

纽约时报评论，对《纽约时报》发表文章的评论，超过200万条评论

1.55G

732 浏览

0 喜欢

0 次下载

0 条讨论

NLP,Computer Science,Programming,News Classification

New York Times has a wide audience and plays a prominent role in shaping people's opinion and outlook on current aff......

数据介绍
文件预览
相关论文
Code
分享讨论(0)
使用声明

启动Notebook开发

数据结构 ? 1.55G

* 以上分析是由系统提取分析形成的结果，具体实际数据为准。

README.md

New York Times has a wide audience and plays a prominent role in shaping people's opinion and outlook on current affairs and also in setting the tone of the public discourse, especially in the USA. The comment section in the articles is very active and it gives a glimpse of readers' take on the matters concerning the articles.

Content

The data contains information about the comments made on the articles published in New York Times in Jan-May 2017 and Jan-April 2018. The month-wise data is given in two csv files - one each for the articles on which comments were made and for the comments themselves. The csv files for comments contain over 2 million comments in total with 34 features and those for articles contain 16 features about more than 9,000 articles.

Inspiration

The data set is rich in information containing comments' texts, that are largely very well written, along with contextual information such as section/topic of the article, as well as features indicating how well the comment was received by the readers such as editorsSelection and recommendations. This data can serve the purpose of understanding and analyzing the public mood.
The exploratory kernel here can be used for a review of the features of the dataset and the NB-Logistic model kernel for predicting NYT's pick can be used as a starter for building models on a range of ideas, some of which are:

Predicting the number of upvotes a comment will receive using the feature recommendations as the target variable. With enough training set for the model, we can make a guess of how a hypothetical comment on a certain topic will be received by the community of NYT readers' and this can be considered a tool to gauge public opinion. The design of this model will be very similar to the ones used in ranking the reviews based on guessing how many upvotes the reviews will receive.
Predicting whether a comment will be editor's pick using feature editorsSelection as the target variable. It gives a clue to what NYT considers worth promoting.
based on a comment, guessing the topic (using sectionName and/or newDesk as the target variable) of the article.
Predicting how likely it is for a comment to get replies (using replyCount feature as the target variable).
Predicting how likely it is for an article to initiate discussion and get comments and upvotes as well as sentiment analysis of the comments' text.
Predicting the same as above for topics (indicated by the features sectionName and/or newDesk).
Analyzing behaviors of the top commenters such as which topics they most likely comment and the sentiment analysis of the comments.

Data collection

The python package here written to supplant this dataset can be used to retrieve comments from a customized search of the NYT articles concerning a specific topic, for example - Iraq war or ObamaCare - in a given timeline. The tutorial here gives detailed information about the use of the package with the help of examples.

暂无相关内容。

分享你的想法

去分享你的想法~~

全部内容

欢迎交流分享

开始分享您的观点和意见，和大家一起交流分享.

数据使用声明：

一、数据来源与展示说明：

1、该数据来自于互联网数据采集或服务商的提供，本平台为用户提供数据集的展示与浏览。
2、本平台仅作为数据集的基本信息展示、包括但不限于图像、文本、视频、音频等文件类型。
3、数据集基本信息来自数据原地址或数据提供方提供的信息，如数据集描述中有描述差异，请以数据原地址或服务商原地址为准。

二、所有权说明：

1、本站中的所有数据集的版权都归属于原数据发布者或数据提供方所有。

三、数据转载说明：

1、如您需要转载本站数据，请保留原数据地址及相关版权声明。

四、侵权与处理说明：

1、如本站中的部分数据涉及侵权展示，请及时联系本站，我们会安排进行数据下线。

所需积分：

20 去赚积分？

732浏览
0下载
0点赞
收藏
分享

今日排行

本月搜索

Dataset Category