As developers, we love digging into datasets to uncover interesting stories—and Netflix’s ever-growing catalog is no exception. In this post, I’ll share how I analyzed Netflix’s content data using Python and Pandas, plus tips for you to get started.
The Dataset
I used the popular Netflix Titles dataset from Kaggle (link here). It contains:
- 6,000+ records of movies and TV shows
- Metadata like title, type, director, cast, country, date added, release year, rating, duration, and genres
What I Did
Data Cleaning & Preparation
Using Pandas, I:
- Filled missing values in
director,cast, andcountry - Converted
date_addedto datetime for time-series analysis - Extracted year and month of addition
- Split genres into lists for better filtering
- Created new columns for movie duration and TV show seasons
Here’s a snippet:
import pandas as pd
df = pd.read_csv('netflix_titles.csv')
df['date_added'] = pd.to_datetime(df['date_added'])
df['year_added'] = df['date_added'].dt.year
df['month_added'] = df['date_added'].dt.month
# Handle missing data
df['director'].fillna('Unknown', inplace=True)
df['cast'].fillna('Various', inplace=True)
df['country'].fillna('Unknown', inplace=True)
Exploratory Data Analysis (EDA)
I focused on:
- Content type trends (movies vs TV shows) over years
- Top producing countries
- Most popular genres
- Movie durations and TV show season counts
Example plot using Seaborn:
import seaborn as sns
import matplotlib.pyplot as plt
sns.countplot(data=df, x='type', hue='year_added', palette='muted')
plt.title('Movies vs TV Shows Over Years')
plt.show()
What I Found
- A clear shift towards more TV shows added since 2016
- The US dominates content production, but countries like India and UK are growing fast
- Drama and Comedy lead genre counts, but Documentaries are on the rise
- Movies are mostly under 100 minutes, while TV shows average 1–3 seasons
What’s Next?
This dataset is perfect for experimenting with:
- Recommendation engines
- Sentiment analysis on descriptions
- Time-series forecasting of content growth
- Interactive dashboards with Plotly or Streamlit
Check It Out!
All code and notebooks are on GitHub:
👉 https://github.com/isacje/Netflix-Data-Analysis
Feel free to fork, run, and extend! And if you want help generating custom plots or automating your analysis pipeline, just ask.
Happy coding!
Isac
Would you like me to help you draft a README-style intro or prepare example scripts for your repo to complement this post?
Top comments (0)