DEV Community

Cover image for How I Served 80,000+ Recommendations in Under 50ms
Mayank Parashar
Mayank Parashar

Posted on

How I Served 80,000+ Recommendations in Under 50ms

Every recommendation tutorial I found was either a Netflix black box or a 1,000-row Jupyter notebook toy. I wanted something in between — real, deployable, and something I actually understood.
That's how Inkpick was born: a hybrid recommendation engine across cinema, music, and courses with sub-50ms inference on 80,000+ items. Just NumPy, FastAPI, and deliberate design choices.

What "Hybrid" Means

Content-Based Filtering — works on day one, no user history needed. But it traps users in a bubble.
Collaborative Filtering — discovers surprising cross-user patterns. Falls apart for new users (cold-start problem).

A hybrid blends both:

score_hybrid(i) = α · score_cb(i) + (1 - α) · score_cf(i)
Enter fullscreen mode Exit fullscreen mode

Inkpick defaults α = 0.65 — content-biased for cold-start users, shifting toward collaborative as history grows.

plaintext
The Architecture
Client (Vanilla JS)
       │
  FastAPI (Async)
  ┌────┴────┬──────────┐
TF-IDF   Latent    Levenshtein
+ CSR    Factor    Fuzzy Search
  └────┬────┘
  Hybrid Layer
       │
 Service Registry
(cinema / audio / edu)
Enter fullscreen mode Exit fullscreen mode

Each domain is fully decoupled. Adding a new domain = one new service file.

Content-Based: TF-IDF + Cosine Similarity
TF-IDF turns item metadata (title, genre, tags) into vectors. Words unique to one item = high weight. Common words like "the" = penalized.

Similarity between items is then a dot product:

similarity(q, i) = (q · i) / (‖q‖ · ‖i‖)

Enter fullscreen mode Exit fullscreen mode

Why not SciPy? Inkpick implements CSR (Compressed Sparse Row) ops directly in NumPy — cutting a ~30MB dependency, reducing memory, and keeping full control over the pipeline. An 80,000-item matrix is ~98% zeros; CSR stores only non-zero values.

Collaborative Filtering: Latent Factors
CF decomposes the user–item interaction matrix into lower-dimensional embeddings:

R ≈ U × Vᵀ
Enter fullscreen mode Exit fullscreen mode

These latent dimensions learn hidden patterns — "likes slow-burn thrillers" — without being told. In Inkpick, this module is a production-ready stub awaiting a trained ALS/BPR model. Honest limitation, next on the roadmap.

Fuzzy Search Fallback
Search "Godfater" → no match → system fails. Not ideal.
Inkpick uses Levenshtein edit-distance as a safety net:

"Godfater" → "Godfather" = 1 edit
Enter fullscreen mode Exit fullscreen mode

When exact search fails, fuzzy kicks in and returns the closest matches. Small addition, big UX improvement.

The API
GET /recommend/cinema?item_id=tt0111161&top_k=5&mode=hybrid
json{
  "domain": "cinema",
  "results": [{ "title": "The Godfather", "score": 0.94 }],
  "latency_ms": 38
}
Enter fullscreen mode Exit fullscreen mode

The mode param accepts content, collaborative, or hybrid — handy for debugging.

What I'd Fix in v2

Train the CF model — ALS or BPR. The hybrid is only as good as both components.
SBERT over TF-IDF — semantic similarity that keyword matching completely misses.

Add evaluation metrics — Precision@K, NDCG. Fast latency is measurable; recommendation quality currently isn't.

Dynamic α — learn the blend weight per user instead of hardcoding 0.65.

Diversity control — MMR to avoid returning "10 Batman movies."

Try It

live : inkpick.vercel.app
github : github.com/MayankParashar28/inkpick

Drop a comment if you're building something similar — would love to exchange notes.

Top comments (0)