Ravi Kumar Vishwakarma

Posted on Jun 29

CineMatch: How I Built a Movie Recommendation System That Doesn't Need a Single User Rating

#ai #python #machinelearning #architecture

CineMatch: How I Built a Movie Recommendation System That Doesn't Need a Single User Rating

Table Of Content

What CineMatch Actually Does
The Real Challenge: Doing This Without Burning Memory
How It's Put Together
The Model and the Toolkit
A Few Things Worth Noticing
The Part That Surprised Me
Try It and Tell Me What's Off

Most recommendation engines you've used — Netflix, YouTube, Spotify — learn from millions of people clicking, watching, and rating things. Mine doesn't have that luxury. It works on text alone, and that one constraint shaped almost every decision in this project.

View my work on

What CineMatch Actually Does

CineMatch is a content-based movie recommendation system. You give it a movie you like, and it returns similar titles — not because other users rated them similarly, but because the movies themselves share genres, release era, and identifying metadata. Under the hood, every movie in the dataset gets converted into a "tag" signature, and the system finds the closest matches using a Bag-of-Words model and cosine similarity.

It runs on a dataset of roughly 16,250 IMDb movies spanning 1980 to 2026, served through a FastAPI backend with a lightweight frontend on top. You can try it live at cin-match-ai.vercel.app, or read through the full build in the GitHub repo.

Why does content-based matter? Because it sidesteps the "cold start" problem that trips up rating-based systems. A brand-new movie with zero reviews can still get recommended the day it's added, since the system never needed ratings in the first place.

The Real Challenge: Doing This Without Burning Memory

The obvious way to compute similarity between every pair of movies is to build one giant matrix comparing all of them to each other. With around 16,250 movies, that's a matrix of roughly 264 million cells sitting in memory at all times — expensive for not much benefit.

The fix was to flip the order of operations. Instead of precomputing everything, the app keeps only the sparse vectorized version of the dataset in memory and computes cosine similarity for a single row — the movie you just searched for — at the moment you ask for it. It's a memory-for-latency trade that holds up well at this dataset size: the live demo reports under 8MB of in-memory overhead and recommendation requests answering in around 0.12 seconds.

The second challenge was less about code and more about being honest about limitations. Poster images come from an unofficial, community-run IMDb proxy rather than an official API — fine for a side project, but not something you'd want to depend on without a fallback in production. Rather than hide that, it's called out directly in the project's own documentation, alongside a few other rough edges like an in-memory poster cache that resets on every restart. Writing those down instead of glossing over them is, honestly, the more useful habit to build early.

How It's Put Together

The workflow is straightforward once you see it laid out:

A raw IMDb dataset gets cleaned in a Jupyter notebook — lowercasing text, filling missing values, and combining title, genre, year, and IMDb ID into a single "tags" field per movie.
At server startup, FastAPI loads the cleaned data and fits a CountVectorizer (5,000 features, English stop words removed) to turn every movie's tags into a sparse numeric vector.
When you search for a movie and ask for recommendations, the app computes cosine similarity between your chosen movie's vector and every other vector, sorts by score, and returns the closest matches with their metadata attached.
A separate endpoint fetches movie posters on demand and caches them in memory so repeat lookups don't hit the external API again.

The Model and the Toolkit

The recommendation logic runs on scikit-learn's CountVectorizer for turning text into numbers, paired with cosine similarity to measure how close two movies are in that numeric space. Everything sits behind a FastAPI backend running on Uvicorn, with Pandas and NumPy handling the data wrangling, HTTPX managing async calls to the poster API, and Pydantic validating incoming requests. The frontend is plain HTML, CSS, and JavaScript — no framework overhead, since the goal was a fast, simple interface rather than a complex one.

A Few Things Worth Noticing

The search bar ranks autocomplete results by popularity (vote count), so typing "dark knight" surfaces the film people actually mean, not just the first alphabetical match.
Recommendations come back with real metadata attached — genre, release year, IMDb rating, vote count — so you're not just getting a title, you're getting enough context to decide if it's worth watching.

The Part That Surprised Me

Going in, I assumed a "good" recommender needed user behavior data to feel personal. What this project showed me is that text alone — genre, year, a handful of identifiers — captures a surprising amount of what makes two movies feel similar. It's not as nuanced as a system trained on millions of viewing patterns, but it's honest about what it's doing, and it works without needing a single person to have rated anything first.

Try It and Tell Me What's Off

If you want to see how it behaves, search a movie you actually like on the live demo and see if the matches make sense to you. The full code, notebooks, and a documented list of known limitations are in the GitHub repository — feedback, issues, and pull requests are genuinely welcome, especially if you spot a case where the recommendations miss the mark.

#ai #aiproject #machinelearning #datascience #ravikumarvishwakarma… | Ravi Kumar Vishwakarma

I built a movie recommendation system that never asks you to rate a single movie. Most recommenders need behavior — your ratings, your watch history, what people similar to you watched. Mine works differently. It looks only at what a movie actually is. Meet CineMatch: a content-based movie recommendation engine covering 16,000+ titles from 1980 to 2026. How it works: Every movie gets a short text "signature" built from its title, genre, release year, and IMDb ID. I turn these signatures into vectors using scikit-learn's CountVectorizer, then compare movies using cosine similarity. Whatever movie sits closest in that vector space gets recommended. Two real problems I had to solve: Comparing 16,000+ movies against each other for every search would mean building and storing a massive similarity matrix in memory. Instead, I compute similarity only for the one movie someone asks about, at the moment they ask. Smaller memory footprint, same accuracy. Real datasets are messy. Missing genres, missing runtimes, inconsistent text formatting — all of it had to be cleaned and standardized before the model could trust it. What it does for users: Type-ahead search, ranked by popularity Instant recommendations with year, genre, and rating attached Poster lookups, cached so the app doesn't keep hitting an external API Here's the part I find genuinely interesting: because this is content-based, not collaborative, a movie nobody has watched yet can still get recommended well. No "not enough people rated this" problem — just "does this movie look like that one." While documenting the project, I also caught a small gap between what my code's comments claimed it did and what it actually did. Small bugs like that taught me more than the model itself. Repo's here if you want to look under the hood, try it out, or tell me what's wrong with it: https://lnkd.in/dtVVg_Pi Would love to hear your thoughts — especially if you've built something similar. #ai #aiproject #machinelearning #datascience #ravikumarvishwakarma #aimlstudent

linkedin.com

Keywords: content-based recommendation system, movie recommendation system Python, machine learning project portfolio, FastAPI machine learning project, scikit-learn cosine similarity, NLP recommender system, build a recommendation engine, AI/ML engineering project, CountVectorizer movie recommender, data science project for resume, ravi kumar vishwakarma, ravi kumar, ravi vishwakarma, ravi recent project, ravi new project

DEV Community

CineMatch: How I Built a Movie Recommendation System That Doesn't Need a Single User Rating

CineMatch: How I Built a Movie Recommendation System That Doesn't Need a Single User Rating

Table Of Content

View my work on

What CineMatch Actually Does

The Real Challenge: Doing This Without Burning Memory

How It's Put Together

The Model and the Toolkit

A Few Things Worth Noticing

The Part That Surprised Me

Try It and Tell Me What's Off

#ai #aiproject #machinelearning #datascience #ravikumarvishwakarma… | Ravi Kumar Vishwakarma

Top comments (0)