DEV Community

Emil
Emil

Posted on

Beyond Static RAG: Using 1958 Biochemistry to Beat Multi-Hop Retrieval by 14%

Standard Retrieval-Augmented Generation (RAG) often falls short on complex, multi-hop questions because it relies on static "lock and key" query matching. If the information needed to answer a query is semantically distant from the original text, standard vector search simply won't find it.

We've developed Induced-Fit Retrieval (IFR), a dynamic graph traversal approach that mutates the query vector at every step to discover semantically distant but logically connected information.

The Core Results
We ran our prototype through a rigorous test suite of 30 queries across multiple graph sizes, up to 5.2 million atoms.

14.3% higher nDCG@10 compared to a competitive RAG-rerank baseline.

15% Multi-hop Hit@20 in scenarios where traditional RAG methods scored 0%.

O(1) Latency Scaling: Latency remains near 10ms whether searching 100 atoms or 5.2 million.

Why Biochemistry?
The system is inspired by Daniel Koshland’s 1958 "induced fit" model. In biology, enzymes change shape upon encountering a substrate to improve binding.

IFR applies this to Information Retrieval: instead of a static query vector, the vector mutates at each hop based on the visited node's embedding. This allows the query to follow the "curved manifolds" of high-dimensional embedding space that a fixed vector cannot reach.

Lessons from the Data
Transparency is key to research, so we are also sharing our failures:

Catastrophic Drift: 67% of our failures occurred because the query mutated too aggressively, losing its original intent.

The Solution: v2 will implement an "Alpha Floor" to preserve at least 50% of the original query signal at all times.

We have open-sourced the prototype, our 18 raw JSON result logs, ablation studies, and full technical reports.

Check out the repo on GitHub:
https://github.com/emil-celestix/celestix-ifr

Top comments (0)