Hey fellow developers! Today, I'm excited to share a cool project I've been working on: a movie recommendation system built with Python and Streamlit. This system suggests movies based on a user's favorite film, making it a fun way to discover new movies to watch. Let's dive into how it works!
The Tech Stack
For this project, we're using:
- Python
- Streamlit for the web interface
- pandas for data handling
- scikit-learn for text processing and similarity calculations
- TMDb API for fetching movie posters
How It Works
Data Loading: We start by loading movie data from a CSV file using pandas.
Feature Engineering: We combine several movie features (genres, director, tagline, keywords, cast) into a single string for each movie.
Text Vectorization: Using TfidfVectorizer from scikit-learn, we convert our text data into numerical feature vectors.
Similarity Calculation: We use cosine similarity to calculate how similar movies are to each other based on their feature vectors.
User Input: Through the Streamlit interface, users can input their favorite movie and choose how many recommendations they want.
Recommendation Generation: We find the closest match to the user's input, then use our similarity matrix to find and display the most similar movies.
Movie Posters: To make our app more visually appealing, we fetch movie posters from TMDb API.
The Code
Here's a breakdown of the main components:
import streamlit as st
import pandas as pd
import numpy as np
import difflib
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import requests
# Function to fetch movie posters
def fetch_movie_poster(movie_title):
# ... (implementation details)
# Load and preprocess data
movies_data = pd.read_csv('movies.csv')
selected_features = ['genres', 'director', 'tagline', 'keywords', 'cast']
# Combine features and vectorize
combined_features = movies_data['genres'] + ' ' + movies_data['director'] + ' ' + movies_data['tagline'] + ' ' + movies_data['cast'] + ' ' + movies_data['keywords']
vectorizer = TfidfVectorizer()
feature_vector = vectorizer.fit_transform(combined_features)
# Calculate similarity
similarity = cosine_similarity(feature_vector)
# Streamlit UI
st.title('Movie Recommendation System')
movie_name = st.text_input('Enter the name of your favorite movie:')
num_recommendations = st.slider('How many recommendations would you like?', min_value=1, max_value=30, value=10)
# Generate and display recommendations
if movie_name:
# ... (recommendation logic)
Running the App
To run this app, make sure you have all the required libraries installed and a movies.csv
file with the necessary data. Then, simply run:
streamlit run your_script_name.py
Future Improvements
There are several ways this System could be improved:
- Implement user accounts to track viewing history and improve recommendations over time.
- Add more data sources to get a broader range of movies and more detailed information.
- Incorporate collaborative filtering to consider user ratings and preferences.
- Optimize the similarity calculation for larger datasets.
Conclusion
Building this movie recommendation system was a fun way to combine data science concepts with web development. It's a great starting point for more complex recommendation systems and showcases the power of Python libraries like scikit-learn and Streamlit.
I hope you found this interesting! Feel free to try it out, modify the code, and let me know if you have any questions or suggestions for improvements.
Happy coding! 🎬🍿
Top comments (0)