I built a local-first movie recommender with Corrective-RAG (cited explanations, hybrid retrieval, runs entirely on Ollama)

A Aesthetic — Mon, 25 May 2026 22:50:24 +0000

Hey — sharing a project I've been building for the last
few months. It's a movie recommendation system that runs entirely on
your laptop using Ollama, with a Corrective-RAG pipeline.

Why I built it: existing streaming platforms only know what you
watched on them. Netflix can't see my Prime history, none of them know
about cinema watches. Wanted one system that learns from all of it.

Stack:

7-stage Corrective-RAG (LangGraph static graph, not autonomous agents)
Hybrid retrieval: Chroma dense vectors + rank-bm25 sparse, fused via RRF
BGE-small-en-v1.5 embeddings + BGE-reranker-base cross-encoder
Grader-based correction loop with retry budget
Cited explanations - every bullet must reference a real source field, bullets that fail validation are dropped (no hallucinated plot summaries)
Ollama llama3 default, OpenAI/Anthropic pluggable per role

The interesting design choice was query expansion at INGEST time instead
of query time. The enrichment LLM generates 3-5 pseudo-queries per movie
and embeds them alongside the plot. Catalogues are bounded; user queries
aren't, so paying the LLM cost once per movie scales better than once
per query.

Latency on M3 / 36GB / Ollama llama3: ~90s/query (filter_extract +
explain dominate). llama3.2:1b drops to ~15-20s. Hosted models ~5-10s.

Code + setup: github.com/meetgrewal7793-creator/personal-movie-recommender

The 7-stage architecture diagram is in the README. Feedback welcome —
especially on the grader prompt calibration, which I had to relax for
local-LLM defaults because llama3 graders over-flag results as weak.

DEV Community: A Aesthetic

I built a local-first movie recommender with Corrective-RAG (cited explanations, hybrid retrieval, runs entirely on Ollama)