pueding

Posted on May 30 • Originally published at learnaivisually.com

OmniRetrieval: Source-Native Query Dispatch

#ai #rag #agents #llm

What: The OmniRetrieval paper introduces source-native query dispatch: a router sends a natural-language query to whichever knowledge source fits — text, tables, or graphs — and runs each source's own query engine instead of embedding everything into one vector store.

Why: Many vector-first RAG stacks flatten every source into a single embed-then-ANN pipeline, which throws away the structure that makes tables and graphs useful. Keeping each source native lets JOINs and graph edges survive retrieval, so the generator sees answers a flat similarity search can't assemble.

vs prior: The previous default is a unified vector index — chunk, embed, and nearest-neighbour everything together. Its failure mode is structural collapse: a table's column relationships and a graph's edges are averaged into a single vector and can no longer be queried as structure.

Think of it as

A reference desk that routes each question to the right specialist clerk.

                THE QUERY
                    │
             router (clerk)
                    │
        ┌───────────┴───────────┐
        │                       │
 ┌──────▼────────┐     ┌────────▼──────┐
 │    FLATTEN    │     │   DISPATCH    │
 │  one vector   │     │  text, SQL,   │
 │   index bin   │     │  graph query  │
 └──────┬────────┘     └────────┬──────┘
        │                       │
  shred docs into        ask each clerk
  one big bin            in its own tongue
        │                       │
        ▼                       ▼
 ✗ JOINs & edges        ✓ JOINs compose,
   averaged away          edges traverse

query = a question handed to the reference desk
router = the clerk who decides which specialist to ask
source-native query = asking each specialist in their own language — ledgers, archives, family trees
unified vector index = shredding every document into one bin of identical index cards
preserved structure = the ledger keeps its columns and the family tree keeps its lines

Quick glossary

RAG — Retrieval-Augmented Generation — fetch relevant context from a knowledge store, then condition the model on it.

Vector index — The default RAG store: every chunk is embedded into a fixed-dimension vector, and retrieval returns the top-k nearest by cosine or dot-product similarity. Structure inside a chunk is not preserved — only its position in embedding space.

ANN — Approximate Nearest Neighbour — the index family (HNSW, IVF, FAISS) that makes vector search fast by trading exactness for speed.

Source-native query — Running a source in its own engine: full-text search over passages, a SQL-style query (with JOINs) over tables, a traversal over a graph — rather than one similarity lookup over a shared embedding space.

Heterogeneous-source retrieval — Retrieval across sources of different kinds — unstructured text, relational tables, and graphs — kept in their native form instead of homogenised into one representation.

Knowledge base (KB) — A single corpus the system retrieves from. OmniRetrieval reports evaluating across 309 distinct KBs spanning 13 datasets.

The news. On May 29, 2026, the OmniRetrieval paper (arXiv:2605.29250) proposed a retrieval framework that accepts a natural-language query and routes it to the appropriate knowledge source — unstructured text, relational tables, or graphs — dispatching each source's native query to its own execution engine rather than flattening everything into a single embedding index. The authors report evaluating across 13 datasets and 309 distinct knowledge bases, and exceeding single-source retrieval baselines.

Picture the reference desk again. A question comes in — "which suppliers shipped to the Berlin warehouse in Q3, and who introduced them?" The clerk doesn't translate the question into one bland house dialect and shout it at the whole building. They split it: the archivist searches the prose contracts, the accountant runs the numbers in their ledgers, and the genealogist walks the introductions graph. Each specialist answers in their own language, keeping the structure that makes their corner useful — the ledger's columns, the family tree's lines.

The animation above is that desk. In the first beat the query flows query → router → one flat vector index: every source is shredded into the same uniform grid of vectors, and the table's JOIN arc and the graph's edges go dashed and grey — flattened into embedding space where structure can't be queried. Then the router flips from flatten to dispatch: the same three sources light up in their native forms, and three typed queries fan out — text search, table query · JOIN, graph traversal — with the JOIN arc and graph edges restored in green.

The mechanism is a router plus per-source engines. Instead of an embed-then-ANN lookup over one homogenised store, OmniRetrieval generates a query in each source's own language and runs it on that source's engine, then unifies the heterogeneous results for the generator. Because the table never left its relational form, a JOIN still composes rows by key; because the graph never left its node-edge form, a traversal still follows edges. Those are exactly the structural affordances a single similarity vector erases.

Why flattening loses the answer

Hold the Berlin question fixed and walk the two paths. Say the answer needs a JOIN of a 5,000-row suppliers table with a 40,000-row shipments table on supplier_id. The relational path evaluates the key match exactly — every shipment resolves to its supplier, and the introductions graph then traverses two hops to the people who introduced them. The flat-index path instead embeds each row into, say, a 768-dimension vector and returns, say, the top-k = 20 nearest chunks to the query. Two hundred million potential supplier–shipment pairings (5,000 × 40,000 = 200,000,000, illustrative) collapse into 20 fuzzy neighbours chosen by cosine distance — and "who introduced them" is an edge that was never embedded as an edge. The JOIN and the traversal are not slow in the flat index; they are absent. That is the RAG failure mode source-native dispatch is built to remove.

Flat vector index vs. source-native dispatch

Property	Unified vector index	Source-native dispatch
Index built upfront	one embedding pass + ANN index over all sources	each source keeps its native engine (full-text, SQL, graph)
Query form	one nearest-neighbour lookup over shared space	a source-native query per source, chosen by the router
Structure preserved	no — JOINs and edges averaged into a vector	yes — JOINs compose, edges traverse
Best when	fuzzy topical match over homogeneous prose	answer spans tables / graphs, needs exact relations
Reported scope	—	~13 datasets · 309 KBs, beats single-source baselines (OmniRetrieval)

The table isn't an argument that embeddings are obsolete — fuzzy topical recall over prose is exactly what a vector index is good at. It's that a single representation can't be the right one for text, tables, and graphs at once. Routing to a source-native query lets each source answer with the structure it actually has, and only then unifies — so the generator receives composed relations, not a bag of nearest neighbours.

Goes deeper in: AI Agents → Retrieval & RAG → RAG Failure Modes

Related explainers

Is Grep All You Need? — Grep vs vector retrieval for agentic search — another result that the vector index is one option among many, not the default
CDD — Context-Driven Decomposition for RAG knowledge conflict — what to do once retrieval returns sources that disagree

FAQ

What is source-native query dispatch?

It's a retrieval design where a router sends a natural-language query to whichever knowledge source fits — unstructured text, a relational table, or a graph — and runs that source's own query engine (full-text search, a SQL-style query with JOINs, or a graph traversal) instead of embedding everything into one shared vector store. OmniRetrieval reports doing this across 13 datasets and 309 knowledge bases and exceeding single-source baselines.

Why not just embed tables and graphs into the same vector index?

Because embedding collapses structure. A table's columns and a graph's edges become a single vector positioned by similarity, so a JOIN can no longer compose rows by key and a traversal can no longer follow edges — those relations are averaged away rather than slowed down. Keeping each source native preserves the structural affordances that answer relational and multi-hop questions, which a top-k nearest-neighbour lookup over one space cannot reconstruct.

Does this mean vector retrieval is obsolete?

No. A unified vector index is still the right tool for fuzzy, topical recall over homogeneous prose, where semantic similarity is doing the real work. Source-native dispatch matters when an answer spans different kinds of sources or needs exact relations — tables to JOIN, graphs to traverse. The shift is in the default: route to the source's native engine first, and treat the shared embedding space as one source among several rather than the only one.

Originally posted on Learn AI Visually.

DEV Community