Select Language

AI社区

公开数据集

康奈尔大学(Cornell)提供的影评数据集数据集

康奈尔大学(Cornell)提供的影评数据集数据集

2.8M
1104 浏览
0 喜欢
5 次下载
0 条讨论
Music Analysis Classification

Sentiment polarity datasetspolarity dataset v2.0 ( 3.0Mb) (includes README v2.0): 1000 positive and 1000 negative proces......

数据结构 ? 2.8M

    Data Structure ?

    * 以上分析是由系统提取分析形成的结果,具体实际数据为准。

    README.md

    Sentiment polarity datasets

    • polarity dataset v2.0 ( 3.0Mb) (includes README v2.0): 1000 positive and 1000 negative processed reviews. Introduced in Pang/Lee ACL 2004. Released June 2004.

    • Pool of 27886 unprocessed html files (81.1Mb) from which the polarity dataset v2.0 was derived. (This file is identical to movie.zip from data release v1.0.)

    • sentence polarity dataset v1.0 (includes sentence polarity dataset README v1.0: 5331 positive and 5331 negative processed sentences / snippets. Introduced in Pang/Lee ACL 2005. Released July 2005.


    • archive:

      • polarity dataset v1.0 (2.8Mb) (includes README): 700 positive and 700 negative processed reviews. Released July 2002.

      • polarity dataset v1.1 (2.2Mb) (includes README.1.1): approximately 700 positive and 700 negative processed reviews. Released November 2002. This alternative version was created by Nathan Treloar, who removed a few non-English/incomplete reviews and changing some of the labels (judging some polarities to be different from the original author's rating). The complete list of changes made to v1.1 can be found in diff.txt.

      • polarity dataset v0.9 (2.8Mb) (includes a README):. 700 positive and 700 negative processed reviews. Introduced in Pang/Lee/Vaithyanathan EMNLP 2002. Released July 2002. Please read the "Rating Information - WARNING" section of the README.

      • movie.zip (81.1Mb): all html files we collected from the IMDb archive.

    Sentiment scale datasets

    • scale dataset v1.0 (includes scale data README v1.0): a collection of documents whose labels come from a rating scale. Introduced in Pang/Lee ACL 2005. Released July 2005.

      • Sep 30, 2009: Yanir Seroussi points out that due to some misformatting in the raw html files, six reviews are misattributed to Dennis Schwartz (29411 should be Max Messier, 29412 should be Norm Schrager, 29418 should be Steve Rhodes, 29419 should be Blake French, 29420 should be Pete Croatto, 29422 should be Rachel Gordon) and one (23982) is blank.

    • original reviews for scale dataset v1.0 (includes scale data README v1.0): original reviews from which the subjective extracts in scale dataset v1.0 were extracted.

    Subjectivity datasets

    • subjectivity dataset v1.0 (508K) (includes subjectivity README v1.0): 5000 subjective and 5000 objective processed sentences. Introduced in Pang/Lee ACL 2004. Released June 2004.

    • Pool of unprocessed source documents (9.3Mb) from which the sentences in the subjectivity dataset v1.0 were extracted. Note: On April 2, 2012, we replaced the original gzipped tarball with one in which the subjective files are now in the correct directory (so that the subjectivity directory is no longer empty; the subjective files were mistakenly placed in the wrong directory, although distinguishable by their different naming scheme).

    ×

    帕依提提提温馨提示

    该数据集正在整理中,为您准备了其他渠道,请您使用

    注:部分数据正在处理中,未能直接提供下载,还请大家理解和支持。
    暂无相关内容。
    暂无相关内容。
    • 分享你的想法
    去分享你的想法~~

    全部内容

      欢迎交流分享
      开始分享您的观点和意见,和大家一起交流分享.
    所需积分:0 去赚积分?
    • 1104浏览
    • 5下载
    • 0点赞
    • 收藏
    • 分享