Description
Star Trek scripts Text
Data scraped from data from http://www.chakoteya.net/StarTrek/index.html
Code here: https://github.com/GJBroughton/Star_Trek_scripts
So I could have a play around with information retrieval techniques, nlp and basic web scraping, the dataset generated raw scripts and processed lines from all episodes of:
Star Trek The Original Series (TOS)
Star Trek The Animated Series (TAM)
Star Trek The Next Generation (TNG)
Star Trek Deep Space Nine (DS9)
Star Trek Voyager (VOY)
Star Trek Enterprise (ENT)
Structure:
all_series_line={series_name:{episode number:{character:all_lines}}}
e.g.
all_series_lines['DS9']['episode 0']['SISKO']
Hope this is useful for a bit of fun and practice with text mining but please do let me know of any errors you see or how the dataset can be improved in terms of cleaning and structure.