NLP
星际迷航脚本,所有《星际迷航》系列脚本的原始文本脚本和处理行

42.63M

490

0

星际迷航脚本,所有《星际迷航》系列脚本的原始文本脚本和处理行

NLP,Movies and TV Shows,Text Data,Text Mining

Classification

星际迷航脚本,所有《星际迷航》系列脚本的原始文本脚本和处理行前往PC端下载数据

Description

Star Trek scripts Text

Data scraped from data from http://www.chakoteya.net/StarTrek/index.html

Code here: https://github.com/GJBroughton/Star_Trek_scripts

So I could have a play around with information retrieval techniques, nlp and basic web scraping, the dataset generated raw scripts and processed lines from all episodes of:

  • Star Trek The Original Series (TOS)

  • Star Trek The Animated Series (TAM)

  • Star Trek The Next Generation (TNG)

  • Star Trek Deep Space Nine (DS9)

  • Star Trek Voyager (VOY)

  • Star Trek Enterprise (ENT)

Structure:

all_series_line={series_name:{episode number:{character:all_lines}}}

e.g.
all_series_lines['DS9']['episode 0']['SISKO']

Hope this is useful for a bit of fun and practice with text mining but please do let me know of any errors you see or how the dataset can be improved in terms of cleaning and structure.


发表评论
0评