Athreya aka Maneshwar

Posted on Jun 21

How Apps Know What You Want Next?

#machinelearning #ai #programming #datascience

Hello, I'm Maneshwar. I'm building git-lrc, a Micro AI code reviewer that runs on every commit. It is free and source-available on Github. Star git-lrc to help devs discover the project. Do give it a try and share your feedback.

Open Netflix and the homepage already knows you're in a "true crime documentary at 11pm" mood.

Open Spotify on Monday and there's a playlist that somehow read your weekend.

Open Amazon to buy one cable and leave with a cart full of things you didn't know you needed.

None of that is magic.

It's a recommendation engine doing math on your behavior, and the numbers behind it are genuinely worth understanding, partly because they're clever, and partly because there's a good chance you'll be asked to build one.

Let's walk through how these systems actually work, and then get into the part that quietly powers most modern recommenders: embeddings.

Why bother?

Recommendation systems aren't a nice-to-have feature anymore.

They're load-bearing infrastructure for a lot of the internet.

A few numbers to set the stage: roughly 80% of what people watch on Netflix comes from its recommendations, and around 35% of Amazon purchases trace back to suggested products.

Netflix has estimated its recommender saves the company over a billion dollars a year, mostly by keeping people from rage-quitting the app when they can't find anything to watch.

The market for this technology was valued near $7 billion in 2024 and is projected to roughly triple within five years.

So when an engine suggests the next episode, a complementary product, or a song that fits your vibe, it's not a side quest.

It's often the main loop of the business.

The five-phase pipeline

Under the hood, most recommenders follow the same general flow:

Gather data: Two flavors here. Explicit data is the stuff users deliberately give you i.e ratings, reviews, likes, thumbs up. Implicit data is everything they do without thinking about it i.e clicks, watch time, scroll depth, items left rotting in a cart. Implicit data is messier but there's a lot more of it.
Store it: Depending on whether your data is structured, unstructured, or both, this lands in a warehouse, a lake, or a lakehouse. Boring but necessary.
Analyze it: ML algorithms hunt for patterns and correlations, who behaves like whom, what gets bought together, what signals actually predict the next click.
Filter it: Apply the rules and math that turn "here's everything we know" into "here are ten things to show this person right now."
Refine it: Watch the outputs, measure whether they're any good, retrain. Repeat forever.

Step 4 is where the personality lives.

The filtering strategy you pick is what makes one recommender feel psychic and another feel like it's just showing you the same hoodie you already bought.

The three classic approaches

Collaborative filtering: "people like you also liked..."

Collaborative filtering ignores what the items actually are and looks purely at behavior.

The core assumption: if you and I have agreed on a hundred things, we'll probably agree on the hundred-and-first.

It comes in two main styles:

Memory-based systems treat everything as one giant user–item matrix and look for nearest neighbors, basically k-NN with a fancier hat. User-based filtering compares rows (you vs. other users); item-based filtering compares columns (this item vs. other items, based on who interacted with them).
Model-based systems train an actual predictive model on that matrix. The most famous trick here is matrix factorization: take a huge, empty user–item matrix and decompose it into two skinny matrices one describing users one describing items across a handful of hidden dimensions. Multiply them back together and you've predicted the blanks. Those blanks are your recommendations.

Hold onto that matrix factorization idea.

Those "hidden dimensions" are embeddings wearing a trench coat, and we'll come back to them.

Collaborative filtering is powerful and doesn't need anyone to describe the items.

Its kryptonite is the cold start problem: a brand-new user or a brand-new item has no history, so the system has nothing to compare.

Spotify and Amazon both lean heavily on this approach.

Content-based filtering: "this is similar to what you liked"

Content-based filtering flips the logic.

Instead of asking who else is like you, it asks what else is like the things you already enjoyed.

It leans on item features like genre, price, color, category, tags, descriptions.

And here's where embeddings step into the spotlight, because content-based systems represent items and users as vectors in a shared space.

The closer two vectors sit, the more similar the items.

Recommend the neighbors, and you're done.

The upside: it handles cold start better, since a new item just needs metadata, not a history.

The downside: it can get stuck in a bubble, endlessly recommending variations of what you've already seen.

Liked one legal thriller? Enjoy your seventeen legal thrillers.

Hybrid: "why not both?"

A hybrid system fuses collaborative and content-based filtering to cover each other's weaknesses.

It's more accurate but also more demanding the more architecture, more compute, more things to break.

Netflix runs a hybrid system, which is part of why its suggestions feel eerily tuned.

Now, the embeddings part

Here's the unifying idea behind a lot of this: an embedding is a way to turn something complicated like a movie, a product, a user, a sentence into a list of numbers (a vector) that captures its meaning.

Things that are similar end up with similar vectors, sitting close together in space.

Once everything is a vector, "find me similar items" becomes "find me nearby vectors," and that's a problem computers are extremely good at.

The standard way to measure "nearby" is cosine similarity i.e the angle between two vectors.

Point in the same direction, score near 1.
Perpendicular, score near 0.
Opposite, score near -1.
Magnitude doesn't matter, only direction, which is exactly what you want when comparing taste.

Here's the whole concept in a few lines of Python:

import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Each number is a learned "taste dimension" — maybe
# action-ness, romance-ness, how indie it is.
movies = {
    "Die Hard":        np.array([0.9, 0.1, 0.8, 0.2]),
    "Mad Max":         np.array([0.85, 0.05, 0.9, 0.15]),
    "The Notebook":    np.array([0.1, 0.95, 0.2, 0.7]),
    "Pride & Prejudice": np.array([0.05, 0.9, 0.15, 0.8]),
}

def recommend(liked, catalog, top_n=2):
    target = catalog[liked]
    scores = {
        title: cosine_similarity(target, vec)
        for title, vec in catalog.items()
        if title != liked
    }
    return sorted(scores.items(), key=lambda x: x[1], reverse=True)[:top_n]

print(recommend("Die Hard", movies))

Like Die Hard? The engine reaches for Mad Max, not The Notebook not because anyone hand-coded that rule, but because the vectors landed close together.

In a real system you wouldn't pick those four numbers by hand.

A model learns them from data: matrix factorization derives them from the user–item matrix, neural networks learn richer ones, and for text or images you'd hand things to a pretrained embedding model and get hundreds of dimensions back.

Same principle, just more dimensions than a human can picture.

This is also why embeddings are such a big deal beyond recommendations.

The exact trick, represent meaning as vectors, compare by proximity and it is what powers semantic search, retrieval-augmented generation, clustering, and a good chunk of modern AI tooling. Learn it once, reuse it everywhere.

The parts nobody puts on the landing page

Building a recommender that works is a different sport from building one that demos well.

A few things that may bite teams in production:

Scale and speed. You're serving real-time suggestions to potentially millions of people at once. Cosine similarity across four movies is trivial; doing it across ten million items with sub-100ms latency is its own engineering discipline (hello, approximate nearest neighbor search).
The wrong metric trap. Optimize for the wrong thing and you'll just keep surfacing whatever's already popular, burying new or niche items in a feedback loop. The most-clicked item isn't always the one the user actually wants.
Bias. Models happily absorb whatever bias lives in the training data. If your history is skewed, your recommendations will be too and that's a product and ethics problem, not just a math one.
Privacy and compliance. All of this runs on user data, and users increasingly opt out, while regulators increasingly pay attention. "Collect everything" is no longer a free strategy.
Cost. Hybrid systems and deep models are hungry. Sometimes a simpler approach that's 90% as good and a tenth of the cost is the right engineering call.

Where this shows up

Once you start looking, recommenders are everywhere: e-commerce ("frequently bought together"), media and streaming (the next episode, the next track), travel ("hotels for your budget and dates"), and marketing (which case study to email which lead).

They've even moved into AIOps, where they suggest fixes to IT teams during incidents and a recommendation engine for "your server is on fire, try this."

The takeaway

Strip away the branding and a recommendation engine is doing something pretty intuitive: it turns users and items into vectors, then measures who's close to whom.

Collaborative filtering learns those vectors from behavior, content-based filtering builds them from features, hybrids do both, and embeddings are the common language underneath it all.

If you're going to learn one concept from this, make it embeddings.

The "represent meaning as numbers, compare by distance" idea is the same move behind recommendations, search, and most of the AI stack you'll touch this decade.

Get comfortable with it now, and a surprising amount of modern ML stops looking like magic and starts looking like geometry.

Now go build something that knows what people want before they do.

Disclaimer: This article was written by me; AI was used to fix grammar and improve readability.

AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs — without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.

⭐ Star it on GitHub:

HexmosTech / git-lrc

Free, Micro AI Code Reviews That Run on Git Commit

git-lrc

Free, Micro AI Code Reviews That Run on Commit

GenAI today is a race car without brakes. It accelerates fast -- you describe something, and large blocks of code appear instantly. But AI agents silently break things: they remove logic, relax constraints, introduce expensive cloud calls, leak credentials, and change behavior -- without telling you. You often find out in production.

git-lrc is your braking system. It hooks into git commit and runs an AI review on every diff before it lands. 60-second setup. Completely free.

In short, git-lrc helps Prevent Outages, Breaches, and Technical Debt Before They Happen

At a glance: 10 risk categories · 100+ failure patterns tracked · every commit…

View on GitHub