公开数据集

实践使大师：电影收藏分析

56.36M

379 浏览

0 喜欢

0 次下载

0 条讨论

Business,Arts and Entertainment,Movies and TV Shows,Classification,Data Visualization,Time Series Analysis Classification

数据介绍
文件预览
相关论文
Code
分享讨论(0)
使用声明

启动Notebook开发

数据结构 ? 56.36M

* 以上分析是由系统提取分析形成的结果，具体实际数据为准。

README.md

Context The data set represents movies which were released in the years of xxx up to 2017. It is kept quite general and does not have any real problem / challenge as a background. The whole data set is meant to practice different types of techniques for a data analyst / data scientist. I′d like also to mention that the Dataset is not fully cleaned. Reasoning is that it shall demonstrate you the real life of being an Analyst / Scientist. Get Data - Prep Data - Analyse Data - Visualize Data - Predict Outcomes of different Use Cases ;-) Content I love watching movies and therefore tried to combine this hobby with my current self studies of becoming a data scientist. Therefore I needed a way to obtain a data set which included information of movies so that I could play around and use my learnings. On the first glance I could see that the data set can be used for Regressions, Classifications or potentially even Deep Learning (such as Image Recognition - Post URLs are given) I did aquire this dataset by using different steps. First I did check the internet for a specific API which I may use to receive movie information. After a short time I got to know omdbapi.com. With the help of this API I was able to fetch information based on the title of the movies. Now I had another problem. I was missing movie titles. The next search had begun. I couldn′t find an API for that but I did see that wikipedia was quite well structured in regards to movie titles. So I did build a scraper to fetch all movie titles from 1990 to 2017. After receiving all the data I could finally start to obtain all movie information of a movie by having the title + year (there might be movies which have the same name). Unfortunately some movie titles have been written differently and so I had a failure rate of 10% for obtaining the movie data. Based on the 10% failed movie titles - I did an Text Analysis and found around 400 000 new Movies / Series. The latest Version should include nearly 200 000 different movies based on the imdbID. Additionally I did clean some of the information such as Genre, Actors and Writer for better analysing. Each of the CSV File can be joined by the **imdbID**. Be aware that some information are missing and declared as *_NOT_GIVEN*. Acknowledgements - Thanks to omdbapi.com for providing such a good API and well structured data. Inspiration The inspiration of this data set came from getting into the practical flow of developing an image recognition application. **Recognize the genre of a movie by the given poster.** By request I could also provide the images of the movies. But for the given Dataset I do have the following questions in my mind: 1. Does the Genre correlate with the given Scoring? 2. Can we see a hype of specific genre over the past years? 3. Do the actors or writer prefer a genre? 4. Do the actors or writer have an impact on the imdb scoring? 5. Do the directors have prefered actors for their movies? 6. Do the directors have prefered writers for their movies? 7. How many movies have been produced by the directors? 8. Is there any relation between the director and the imdb rating? 9. .... many more questions :-)

暂无相关内容。

分享你的想法

去分享你的想法~~

全部内容

欢迎交流分享

开始分享您的观点和意见，和大家一起交流分享.

数据使用声明：

一、数据来源与展示说明：

1、该数据来自于互联网数据采集或服务商的提供，本平台为用户提供数据集的展示与浏览。
2、本平台仅作为数据集的基本信息展示、包括但不限于图像、文本、视频、音频等文件类型。
3、数据集基本信息来自数据原地址或数据提供方提供的信息，如数据集描述中有描述差异，请以数据原地址或服务商原地址为准。

二、所有权说明：

1、本站中的所有数据集的版权都归属于原数据发布者或数据提供方所有。

三、数据转载说明：

1、如您需要转载本站数据，请保留原数据地址及相关版权声明。

四、侵权与处理说明：

1、如本站中的部分数据涉及侵权展示，请及时联系本站，我们会安排进行数据下线。

所需积分：

0 去赚积分？

379浏览
0下载
0点赞
收藏
分享

今日排行

本月搜索

Dataset Category