## The Problem Nobody Talks About
Ask your RAG system: "What award did the director of Inception win?"
This requires two hops:
- Inception → Christopher Nolan
- Christopher Nolan → Academy Award
Your retrieval engine does hop 1 fine. But hop 2? The embedding of the original query is nowhere near "Academy Award" in vector space. The answer sits at rank 665. Your top-20 retrieval window never sees it.
We tested this systematically on HotpotQA fullwiki — 5.2M Wikipedia articles, 500 multi-hop questions.
Every traditional method scored 0% Hit@20. BM25. Dense retrieval. Rerankers. All of them.
What If the Query Could Change Shape?
In 1958, Daniel Koshland proposed the induced-fit model of enzyme binding. Unlike the rigid "lock and key" model, enzymes change their shape to fit the substrate.
We applied the same principle to retrieval.
At each hop, IFR mutates the query embedding based on what it just found. The query literally reshapes itself to reach the next piece of evidence.
Query → [hop 1: find Film X] → mutate → [hop 2: find director] → mutate → [hop 3: find award] → found
The Drift Problem
This sounds elegant on paper. In practice, v1 was a disaster.
67% of failures came from catastrophic drift — the query mutated so aggressively that by hop 3, it had lost >80% of its original meaning. It was finding documents, but completely wrong ones.
We tested 8 drift correction approaches:
- PID controllers
- Sentinel beams
- Moving anchors
- Drifting anchors
- Threshold tuning
- Hierarchical traversal
- Attention-based edge weighting
- Swarm coordination (Boids)
Most made things worse. The winner was embarrassingly simple:
# Blend 50% of original query at every hop
query_vector = 0.5 * mutated + 0.5 * original
# Hard reset if drift exceeds threshold
if cosine_sim(query_vector, original) < 0.5:
query_vector = original
Two lines of code. nDCG went from 0.197 to 0.317 (+61%).
Benchmark Results
Tested on HotpotQA fullwiki: 5.2M Wikipedia articles, 500 questions, 3 random seeds, single RTX 3060.
| Method | R@5 | R@10 | MRR |
|---|---|---|---|
| RAG-rerank baseline | 0.337 | 0.337 | 0.548 |
| IFR-hybrid+CE | 0.366 | 0.366 | 0.554 |
| Delta | +2.9% (p=0.0002) | +2.9% | +0.6% |
R@5 = R@10 because IFR surfaces all retrievable targets within the top 5 — ranks 6–10 add no new hits at this difficulty level.
Scaling: O(1) latency — 100x data growth = 1.1x latency growth. Beam traversal takes ~10ms on the full 5.2M corpus.
Why Three Layers Beat Perfect Traversal
Raw beam search R@5 = 0.309. With cross-encoder reranking: 0.366 (+5.7 points).
The insight: drift noise scores high against the mutated query but low against the original. So the cross-encoder naturally filters it. Trying to eliminate drift at the beam level gives diminishing returns. The multi-layer pipeline is the actual solution.
Limitations (We're Honest)
- The 50% blend ratio is empirical. We don't have a principled method for setting it.
- Tested only on HotpotQA fullwiki. Other multi-hop benchmarks needed.
- Single GPU (RTX 3060). Not benchmarked at enterprise scale. ---
Question for the community:
We fixed drift with a static 50% anchor blend — but this feels like a brute-force solution. Has anyone worked on adaptive blending that adjusts the anchor weight based on query complexity or hop distance? Curious what approaches you've tried.

Top comments (0)