Yurukusa

Posted on Mar 24 • Edited on Jun 28

Re-ranking Isn't Just Sorting Your Search Results (Anthropic Academy Part 3)

#claudecode #ai #anthropic #rag

Part 3 of the Things I Didn't Know About Claude series.

This one is about RAG. I thought I understood it. The quiz said otherwise.

What I Thought Re-ranking Was

I knew about embeddings and vector search. I knew you retrieve chunks and pass them to Claude as context. I thought "re-ranking" just meant sorting those chunks by their similarity score.

That's not what it is.

Re-ranking is a separate step where an LLM (or specialized re-ranker model) re-evaluates the initial search results against the original query. It's a second opinion from a smarter judge.

The initial search (vector similarity, BM25) optimizes for recall — cast a wide net, don't miss anything relevant. Re-ranking optimizes for precision — look at what we caught and keep only the truly relevant pieces.

The initial retriever returns 50 chunks. The re-ranker scores each one against the actual query, reorders them, and keeps the top 5. Different model, different scoring logic, different purpose.

How It Actually Works

The course walked through a concrete implementation. You take your merged search results, format them with IDs in XML, and ask Claude:

"Here are documents related to the user's question. Return the three most relevant document IDs in order of decreasing relevance."

A key optimization: assign random IDs to each chunk and ask Claude to return just the IDs, not the full text. This way Claude doesn't waste tokens copying text you already have.

The implementation uses a prefill + stop sequence to force clean JSON output. You could use tool use for structured output, but for a simple ranked list, prefill is lighter and faster.

What I Was Missing

I'd been doing: embedding search → pass all results to Claude. No re-ranking step.

This works for simple queries where the top embedding matches are clearly relevant. It falls apart when:

The query uses abbreviations the embeddings don't match well
Multiple chunks are somewhat relevant but in different ways
The embedding model ranks a superficially similar but off-topic chunk higher than a truly relevant one

The course showed a specific example: searching for "what did the ENG team do with incident 2023?" Without re-ranking, a cybersecurity section ranked first (because "incident" matched strongly). With re-ranking, Claude recognized that "ENG" meant "engineering" and promoted the software engineering section.

The Trade-off

Re-ranking adds an API call to your retrieval pipeline. More latency. More cost. But for any non-trivial RAG application, the accuracy improvement is worth it.

The recommended pipeline: vector search + BM25 → merge results → re-rank with Claude → pass top N to final prompt.

Embeddings: One More Thing

While we're on RAG — the quiz also caught me on embeddings. I knew what they were. I didn't know a practical detail: Anthropic doesn't provide an embedding model. The recommended provider is Voyage AI, which requires a separate account and API key.

Each number in an embedding vector represents a score of some quality of the input text. We don't know what each dimension actually represents — but similar texts produce similar vectors, and that's what makes semantic search work.

Next in the series: MCP Inspector — the debugging tool I didn't know existed.

Anthropic Academy is free: anthropic.skilljar.com

What's your RAG setup? Are you using re-ranking, or going straight from retrieval to generation?

More on Claude Code safety: cc-safe-setup on GitHub

DEV Community