DEV Community

jpetoskey
jpetoskey

Posted on

2 2

Pick a genre, many genres?

My first performance task in my data science program with Flatiron School has been to produce recommendations for the theoretical Microsoft Studios about their first movie. After looking through the data, I decided to explore genre first, to see if I noticed any trends.

As I was joining genre to the movie titles in a separate csv, I was surprised to see that some genre pairings were more common than others. For example, adventure, the most popular genre in the top 50 grossing movies, is most often paired with action. I can remember my father talking about which movies he liked when I was a kid, and his descriptions often involved action and adventure. When he dragged me to Disney's Pocahontas, I was pleasantly surprised with action and adventure, despite my preconception of it being a romance.

However, counting movies by genre pairs or combinations didn't give me enough reliable information to make recommendations to theoretical Microsoft Studios, as you can see in this histogram.

Image description

Thinking of a movie as one particular genre can be rather limiting. I think this is why we are currently observing the trend that Jourdan Alderidge describes in The Beat. They write, "The majority of content produced in the last several decades are often genre hybrids, using the rules of genre theory to produce new, unique, and different stories" (The Beat)

However, I was determined to pull individual genres because understanding the pattern of individual genres would be important when theoretical Microsoft Studios is planning their first movie. This resulted in some difficulty, as the genres were saved in the object data type, and to explode them into distinct rows, I would need them in a list.

This resource was invaluable and provided me with the code needed to convert the string into a list, based on the location of each comma. Without the comma indicator, split(), will split a string on whitespace, but we do not have whitespace in our string, so we specify for it to split(',') on the comma.

df3['Genre'] = df3['Genre'].str.split(',')
Enter fullscreen mode Exit fullscreen mode

I was then able to use the explode method and pull each genre into a new row.

df2 = df2.explode('Genre')
Enter fullscreen mode Exit fullscreen mode

Once all the genre's were in separate rows, I was able to produce a histogram of the genres of the top 50 grossing movies.

Image description

This made me happy and seemed to give me more relevant information to make recommendations to Microsoft. And, going back to my Dad's preferences, it seems action/adventure is a good place to start.

Image of AssemblyAI tool

Transforming Interviews into Publishable Stories with AssemblyAI

Insightview is a modern web application that streamlines the interview workflow for journalists. By leveraging AssemblyAI's LeMUR and Universal-2 technology, it transforms raw interview recordings into structured, actionable content, dramatically reducing the time from recording to publication.

Key Features:
🎥 Audio/video file upload with real-time preview
🗣️ Advanced transcription with speaker identification
⭐ Automatic highlight extraction of key moments
✍️ AI-powered article draft generation
📤 Export interview's subtitles in VTT format

Read full post

Top comments (0)

Image of Timescale

Timescale – the developer's data platform for modern apps, built on PostgreSQL

Timescale Cloud is PostgreSQL optimized for speed, scale, and performance. Over 3 million IoT, AI, crypto, and dev tool apps are powered by Timescale. Try it free today! No credit card required.

Try free

👋 Kindness is contagious

Explore a sea of insights with this enlightening post, highly esteemed within the nurturing DEV Community. Coders of all stripes are invited to participate and contribute to our shared knowledge.

Expressing gratitude with a simple "thank you" can make a big impact. Leave your thanks in the comments!

On DEV, exchanging ideas smooths our way and strengthens our community bonds. Found this useful? A quick note of thanks to the author can mean a lot.

Okay