Building a Movie Recommender System: A Journey from Pre-Med to ML 🎬

#python #machinelearning #nlp #futureofai

Transitioning from a pre-medical background to Electrical Engineering at NUST taught me one thing: Math is the universal language of logic. Recently, I decided to dive deep into Machine Learning to build a Content-Based Movie Recommender System.

In this post, I’ll walk you through how I used NLP and Cosine Similarity to suggest movies based on user preferences.

The Tech Stack 🛠️
Language: Python

Libraries: Pandas, NumPy, Scikit-learn, NLTK

Dataset: TMDB 5000 Movies Dataset

The Workflow 🧠

Data Cleaning & Feature Selection
The first step was to merge datasets and extract relevant features like genres, keywords, cast, and crew. I created a "tags" column that combines all these textual descriptions.
Text Preprocessing (Stemming)
To make sure "action" and "actions" are treated the same, I used NLTK's PorterStemmer.

Python
from nltk.stem.porter import PorterStemmer
ps = PorterStemmer()

Applied to the tags column

Vectorization (Bag of Words)
I converted the text tags into 5,000-dimensional vectors using CountVectorizer, removing common English stop words.
The Mathematical Engine: Cosine Similarity
Instead of Euclidean distance, I used Cosine Similarity to calculate the angular distance between movie vectors. The closer the vectors, the more similar the movies!

Key Challenges 🚧
The biggest hurdle was managing the large similarity matrix in a cloud environment. Dealing with memory limits and "truncated files" taught me a lot about efficient data handling and the importance of proper serialization using pickle.

Conclusion & Future Scope
This project was a fantastic way to apply linear algebra and NLP concepts. My next step is to deploy this as a full web app and integrate movie posters via API.

Check out the full source code on my GitHub: 👉 https://github.com/Urooj25/Movie-Recommender-System.git

Let’s Connect!
I'm always open to feedback and collaboration. Drop a comment or connect with me on LinkedIn!