DEV Community

Santanu Mohanta
Santanu Mohanta

Posted on

My RAG pipeline couldn't find the CEO — here's how I fixed it with hybrid retrieval

In my last post, I built a RAG pipeline from scratch — no LangChain, just FastAPI + FAISS. It scored 17/19 on my test set. But two questions failed:

  • "Who is the CEO?" — couldn't find it
  • "How many employees does Zentara have?" — couldn't find it

Both answers were right there on page 1. So what went wrong, and how did I fix it?

Why pure vector search failed

The problem was a dense "Company snapshot" table on page 1 — CEO, CTO, HQ, employee count, revenue, all packed into one chunk. The embedding for that chunk became a muddy average of 8+ topics, so when I asked "Who is the CEO?", it didn't rank highly against any specific query.

This is the classic weakness of pure semantic search. The word "CEO" appears exactly once in the document. A keyword search would find it instantly. But vector search relies on semantic similarity, and a short query doesn't produce a strong enough match against a chunk that's mostly about other things.

The fix: hybrid retrieval

The solution is to run two searches in parallel and combine the results:

  1. FAISS (dense) — semantic similarity, good at "What's the charging time?" style questions
  2. BM25 (sparse) — keyword matching, good at "Who is the CEO?" style questions

Then merge them using Reciprocal Rank Fusion (RRF) — a standard algorithm that combines ranked lists from different sources.

question ─► embed ─► FAISS search ──┐
                                    ├─► RRF fusion ─► top-k chunks ─► LLM ─► answer
question ─► tokenize ─► BM25 search ┘
Enter fullscreen mode Exit fullscreen mode

How RRF works

RRF is simple. For each chunk that appears in either ranked list, compute:

rrf_score = 1/(k + rank_in_faiss) + 1/(k + rank_in_bm25)
Enter fullscreen mode Exit fullscreen mode

Where k = 60 (standard constant). A chunk that ranks well in both searches scores higher than one that ranks #1 in only one.

Example: chunk 5 is ranked #1 by BM25, #4 by FAISS:

From FAISS:  1/(60 + 4) = 0.0156
From BM25:   1/(60 + 1) = 0.0164
RRF score:                0.0320  ← beats a FAISS-only #1 (0.0164)
Enter fullscreen mode Exit fullscreen mode

The implementation

Only 3 files changed. Here's the core — the updated store.py:

from rank_bm25 import BM25Okapi

RRF_K = 60

def _tokenize(text: str) -> list[str]:
    return re.findall(r"[a-z0-9]+", text.lower())

class VectorStore:
    def __init__(self):
        self.index = faiss.IndexFlatIP(EMBED_DIM)
        self.chunks = []
        self.bm25 = None

    def add(self, vectors, chunks):
        self.index.add(vectors)
        self.chunks.extend(chunks)
        # Build BM25 index from the same chunks
        tokenized = [_tokenize(c.text) for c in self.chunks]
        self.bm25 = BM25Okapi(tokenized)

    def search(self, query_vector, top_k=3, query_text=""):
        top_k_fetch = min(top_k * 3, self.index.ntotal)

        # Dense search
        _, faiss_indices = self.index.search(query_vector.reshape(1, -1), top_k_fetch)
        faiss_ranking = [int(i) for i in faiss_indices[0] if i != -1]

        # Sparse search
        bm25_scores = self.bm25.get_scores(_tokenize(query_text))
        bm25_ranking = np.argsort(bm25_scores)[::-1][:top_k_fetch].tolist()

        # Reciprocal Rank Fusion
        rrf_scores = {}
        for rank, idx in enumerate(faiss_ranking):
            rrf_scores[idx] = rrf_scores.get(idx, 0) + 1 / (RRF_K + rank + 1)
        for rank, idx in enumerate(bm25_ranking):
            rrf_scores[idx] = rrf_scores.get(idx, 0) + 1 / (RRF_K + rank + 1)

        sorted_indices = sorted(rrf_scores, key=rrf_scores.get, reverse=True)[:top_k]
        return [Retrieval(chunk=self.chunks[i], score=rrf_scores[i]) for i in sorted_indices]
Enter fullscreen mode Exit fullscreen mode

The only change in main.py — one extra parameter:

# Before (v1)
retrieved = store.search(query_vec, top_k=req.top_k)

# After (v2)
retrieved = store.search(query_vec, top_k=req.top_k, query_text=req.question)
Enter fullscreen mode Exit fullscreen mode

That's it. No changes to chunking, embedding, PDF extraction, or LLM logic.

Results: before and after

Question v1 (FAISS only) v2 (hybrid)
Who is the CEO of Zentara Robotics? Failed Correct
How many employees does Zentara have? Failed Correct (top_k=5)
All other 17 questions Correct Correct

The CEO question now works at default top_k=3 — BM25 matches "CEO" directly and RRF promotes it.

The employee count question works at top_k=5. The chunk still ranks lower because it's packed with many facts, but hybrid retrieval brings it within reach. A reranker (cross-encoder) would likely fix this at top_k=3 — that's next on the list.

What I learned

  1. Pure vector search has a keyword blindspot. If a term appears once in a dense chunk, semantic similarity alone won't reliably surface it. BM25 catches these instantly.

  2. RRF is elegant. No score normalization needed, no tuning of weights between the two retrievers. Just ranks and a constant. It works out of the box.

  3. The retriever matters more than the LLM. Both failures in v1 were retrieval failures, not LLM failures. The LLM never even saw the right chunk. Improving retrieval quality is where RAG gets better — not by switching to a fancier model.

  4. Hybrid didn't fully solve dense chunks. The employee count still needs top_k=5. The real fix is either better chunking (split dense tables into smaller pieces) or a reranker that can re-score candidates more precisely.

What's next

  1. Reranker (cross-encoder) — re-score the top-k for better precision
  2. Evaluation harness — automate the 19-question test set instead of testing manually
  3. Streaming — better UX for longer answers

Try it yourself

uv sync
cp .env.example .env   # set your API key
uv run uvicorn app.main:app --reload
Enter fullscreen mode Exit fullscreen mode

Open http://localhost:8000/docs, upload the included sample PDF (data/sample_test_file.pdf), and try "Who is the CEO?" — it works now.


If you've implemented hybrid retrieval or have experience with rerankers, I'd love to hear what worked for you.

I'm Santanu Mohanta — connect with me on LinkedIn or check out my projects on GitHub.

Top comments (0)