DEV Community

jpetoskey
jpetoskey

Posted on

Pick a genre, many genres?

My first performance task in my data science program with Flatiron School has been to produce recommendations for the theoretical Microsoft Studios about their first movie. After looking through the data, I decided to explore genre first, to see if I noticed any trends.

As I was joining genre to the movie titles in a separate csv, I was surprised to see that some genre pairings were more common than others. For example, adventure, the most popular genre in the top 50 grossing movies, is most often paired with action. I can remember my father talking about which movies he liked when I was a kid, and his descriptions often involved action and adventure. When he dragged me to Disney's Pocahontas, I was pleasantly surprised with action and adventure, despite my preconception of it being a romance.

However, counting movies by genre pairs or combinations didn't give me enough reliable information to make recommendations to theoretical Microsoft Studios, as you can see in this histogram.

Image description

Thinking of a movie as one particular genre can be rather limiting. I think this is why we are currently observing the trend that Jourdan Alderidge describes in The Beat. They write, "The majority of content produced in the last several decades are often genre hybrids, using the rules of genre theory to produce new, unique, and different stories" (The Beat)

However, I was determined to pull individual genres because understanding the pattern of individual genres would be important when theoretical Microsoft Studios is planning their first movie. This resulted in some difficulty, as the genres were saved in the object data type, and to explode them into distinct rows, I would need them in a list.

This resource was invaluable and provided me with the code needed to convert the string into a list, based on the location of each comma. Without the comma indicator, split(), will split a string on whitespace, but we do not have whitespace in our string, so we specify for it to split(',') on the comma.

df3['Genre'] = df3['Genre'].str.split(',')
Enter fullscreen mode Exit fullscreen mode

I was then able to use the explode method and pull each genre into a new row.

df2 = df2.explode('Genre')
Enter fullscreen mode Exit fullscreen mode

Once all the genre's were in separate rows, I was able to produce a histogram of the genres of the top 50 grossing movies.

Image description

This made me happy and seemed to give me more relevant information to make recommendations to Microsoft. And, going back to my Dad's preferences, it seems action/adventure is a good place to start.

Top comments (0)