DEV Community

Cristopher Delgado
Cristopher Delgado

Posted on

Movie Analysis for Movie Production

Just recently I finished up my first data science driven project at Flatiron School. It was a big learning experience along with many ups and downs in tackling this project. I am so glad to say that I completed this project and created actionable insights. In this blog post I want to take the time to explain my findings.

Business Problem

The problem at hand involves Microsoft deciding to create a new movie studio. Microsoft does not know anything about creating movies. Overall, Microsoft would like to findings on what films are currently doing the best at the box office. Then translate those findings into actionable insights that the head of Microsoft's new movie studio can use to help decide what type of films to create.

Business Understanding
Microsoft wants to enter the movie production business however as a rookie in this business they need insight on audience opinions about characteristics about movies. Audience opinions impact ROI. In addition to learning about what an audience enjoys, Microsoft must also consider when to release their movies to get the most ROI.

Data

The data is based on datasets from Box Office Mojo, IMDB, Rotten Tomatoes, TheMovieDB, and The Numbers. The Box Office Mojo dataset contains information about movie domestic gross and worldwide gross. The IMDB database contains information about movie production team, cast, basic information, and ratings. The Rotten Tomatoes dataset contains information about critic reviews along with there rating and if the movie was fresh or rotten. The TheMovieDB dataset contains information about a movies genre, vote average, vote count, and popularity score. The Numbers dataset contains information about a movies production budget, domestic gross, worldwide gross, and release date.

Results

TMDB has a unique attribute for movies known as the popularity score.

The popularity score is the lifelong culmination of:

  • Number of votes for the day
  • Number of views for the day
  • Number of users who marked it as a 'favorite' for the day
  • Release date
  • Number of total votes
  • Accounts for the previous days score

It was essential to consider this score from the TMDB dataset to truly factor in many factors calculated by the courtesy of TMDB. This score is a good representation of popularity over the movies lifetime since the release date. Taking this score into consideration we can determine which genres are the most popular by creating a visualization as shown below.

Popularity Score Based on Genre

Sales are important in movie production as this will determine the Return of Investment (ROI). Considering a movies characteristics such as intended audience and its release date can impact its Box Office Sale. As we can see PG-13 movies and PG make the most Box Office Sales which can make the most ROI. Releasing a movie at a certain time of the year can impact the Box Office Sales as we can see below.

Sale Based on Appropriate Movie Audience

Impact of Month on Box Office Revenue

Avg ROI Domestic

Avg ROI Worldwide

Conclusion

This analysis has lead to three recommendations for Microsoft's new movie production studio:

  • Make movies with the following genres: Western, Romance, History, Animation, Family, Mystery, Thriller, Science Fiction, War, Crime, Fantasy, Action, Adventure. These genres had a popularity score of 40 percent or higher which according to TMDB is considered good through excellent.
  • Make movies with the intended Audience of PG-13 and PG. These intended audience ratings proved to have the higher box office sales in comparison to rated R and G movies.
  • Release movies in the summer time as well as the end of the year. The time of the year matters when releasing movies. We can see this occur in the months of May, June, July, November, December and February.

Self Assessment

This project has taught me the fundamentals of data analysis and data science. I learned so much about data cleaning and preparation. I am glad to say that I have learned about Pandas, SQLite3, Matplot.lib, Seaborn, JSON, and API which I utilized in this project.

As great as this project is, there is always room for improvement. I wish I had time to merge the 'Box Office Mojo' data with the data from 'The Number'. These datasets contained similar information in regards to ROI. My code overall as of right now is very linear and not a plug and play file. The same charts can not developed if different datasets were used. If possible I would have liked to create multiple function that can clean the data for you so others can use the same functions as well. Not only would this look more professional but also make the code more readable.

Feel free to check out my repository that contains all the exploratory analysis for this project

Next Steps

Further analysis can provide additional insights for Microsoft and there new movie production studio. Other exploratory routes to take can be the following directions:

  • Explore the ROI based on director for the movie. Directors are important to any movie production as they will be the ones to organize and envision movie scenes and guide its production. The main goal is to pick the best director.
  • Create a model where you can predict the best years to release movies. Eventually any movie studio will have to create an organized timeline for movie releases especially if sequels are made. Predicting a year to release movies can be crucial to obtain the most ROI.
  • Merge Box Office Mojo data with The Number since they contain similar information and look at the domestic gross made on average throughout the year.

Top comments (1)

Collapse
 
pradumnasaraf profile image
Pradumna Saraf

Awesome.