TF-IDF + LLM Reranking: How I Improved Vector Search Accuracy from 60% to 86%
Vector search is powerful — but it’s not perfect. When I was building a database discovery pipeline at work, our initial semantic search was only matching the right schemas about 60% of the time. That wasn’t good enough for production. Here’s exactly how I fixed it using a hybrid TF-IDF and LLM reranking approach.
The Problem
Our pipeline needed to match user queries to the correct database schemas from a large pool of candidates. Pure vector search (embeddings + cosine similarity) was fast but kept returning semantically similar but contextually wrong results.
For example, searching for “customer account balance” would return results about “user wallet transactions” — close, but not what we needed in a strict banking compliance context.
The Solution: Hybrid Retrieval + LLM Reranking
Instead of relying on one method, I combined three layers:
1. TF-IDF for keyword precision
2. Vector embeddings for semantic similarity
3. LLM reranking for contextual judgment
Step 1 — TF-IDF First Pass
TF-IDF is great at catching exact keyword matches that embeddings sometimes miss:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
def tfidf_retrieve(query: str, corpus: list, top_k: int = 20) -> list:
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(corpus)
query_vec = vectorizer.transform([query])
scores = cosine_similarity(query_vec, tfidf_matrix).flatten()
top_indices = np.argsort(scores)[::-1][:top_k]
return [(corpus[i], scores[i]) for i in top_indices]
This gives us a broad candidate set of top 20 results.
Step 2 — Vector Embedding Re-Filter
Next we re-score those 20 candidates using semantic embeddings:
from sentence_transformers import SentenceTransformer
import numpy as np
model = SentenceTransformer("all-MiniLM-L6-v2")
def embedding_rerank(query: str, candidates: list, top_k: int = 5) -> list:
query_embedding = model.encode(query)
scored = []
for text, _ in candidates:
emb = model.encode(text)
score = np.dot(query_embedding, emb)
scored.append((text, score))
scored.sort(key=lambda x: x[1], reverse=True)
return scored[:top_k]
Now we’re down to top 5 highly relevant candidates.
Step 3 — LLM Reranking
This is where the magic happens. We ask Gemini to pick the best match:
import google.generativeai as genai
def llm_rerank(query: str, candidates: list) -> str:
candidate_text = "\n".join(
[f"{i+1}. {c[0]}" for i, c in enumerate(candidates)]
)
prompt = f"""
Query: {query}
Candidates:
{candidate_text}
Which candidate best matches the query in a banking compliance context?
Return only the number of the best match.
"""
model = genai.GenerativeModel("gemini-2.0-flash")
response = model.generate_content(prompt)
return candidates[int(response.text.strip()) - 1][0]
The LLM understands context, domain specifics, and nuance that pure math simply can’t capture.
The Results
| Method | Accuracy |
|---|---|
| Vector search only | ~60% |
| TF-IDF only | ~65% |
| TF-IDF + Embeddings | ~75% |
| Full hybrid + LLM rerank | 86% |
Each layer added meaningful improvement. The LLM reranking alone jumped accuracy by 11 points.
Why This Works
• TF-IDF catches exact terminology matches
• Embeddings capture semantic meaning
• LLM applies domain reasoning and context
No single method is perfect. Combined, they cover each other’s weaknesses.
When Should You Use This?
Use this approach when:
• Your search corpus is domain-specific (legal, medical, banking)
• Exact keyword matches matter alongside semantic meaning
• You can afford a small LLM call per query
• Accuracy matters more than raw speed
Key Takeaway
Don’t default to pure vector search just because it’s trendy. A hybrid approach with LLM reranking is more accurate for specialized domains — and the implementation is simpler than you’d think.
Follow me for more practical AI engineering content. 🚀
Top comments (0)