DEV Community

Chen Zhang
Chen Zhang

Posted on

Vector Graph RAG: Multi-Hop RAG Without a Graph Database

Standard RAG falls apart when the answer isn't in one chunk. Ask "What side effects should I watch for with the first-line diabetes medication?" and the system needs to first figure out that metformin is the first-line drug, then look up metformin's side effects. The query never mentions "metformin" — it's a bridge entity the system has to discover on its own. Naive vector search can't do this.

Multi-hop problem illustration

The industry answer has been knowledge graphs plus graph databases. That works, but it means deploying Neo4j or similar, learning a graph query language, and operating two separate storage systems. The complexity doubles for what's essentially one feature: following entity chains across passages.

I built Vector Graph RAG to get multi-hop reasoning without any of that overhead. The entire graph structure lives inside Milvus — entities, relations, and passages stored as three collections with ID cross-references. No graph database, no Cypher queries, just vector search and metadata lookups.

Architecture comparison

Building a Logical Graph in Milvus

The key insight is simple: a knowledge graph relation like (metformin, is_first_line_drug_for, type_2_diabetes) is just text. Text can be embedded into vectors. So why not store the entire graph structure in a vector database?

Vector Graph RAG uses three Milvus collections with ID cross-references:

  • Entities: Deduplicated entity names, embedded for semantic search. Each entity record stores the IDs of relations it participates in.
  • Relations: Triple-based relations (subject, predicate, object). Each record stores the subject and object entity IDs, plus the IDs of source passages. The relation text is embedded for vector search.
  • Passages: Original document chunks. Each record stores the IDs of entities and relations extracted from it.

These three collections form a logical graph through ID references. "Graph traversal" becomes a series of ID-based metadata queries in Milvus — no graph query language needed.

The extra ID lookups add maybe 2-3 primary key queries per hop. Each takes under 10ms. The real bottleneck in any RAG pipeline is the LLM call (1-3 seconds), so a few extra milliseconds of metadata lookup is invisible.

The Four-Step Retrieval Pipeline

4-step pipeline

Step 1: Seed Retrieval

An LLM extracts key entities from the user query. These entities are embedded and used to search the Entities and Relations collections. The results are the "seeds" — entry points into the logical graph.

Step 2: Subgraph Expansion

This is where multi-hop happens. From each seed entity, the system follows ID references one hop outward: find the entity's relation IDs, fetch those relations, then fetch the entities on the other end of those relations.

Subgraph expansion

In the diabetes example, expanding from "type 2 diabetes" discovers the relation (metformin, is_first_line_drug_for, type_2_diabetes), which surfaces "metformin" — the bridge entity the original query never mentioned. From "metformin," another expansion finds relations about renal function monitoring and side effects.

Step 3: LLM Reranking

After expansion, we have a pool of candidate relations and passages. A single LLM call scores and filters them for relevance to the original query. This replaces what iterative approaches do with multiple rounds of LLM-guided search.

Step 4: Answer Generation

The top-ranked relations and their associated passages go to the LLM for final answer generation.

Two LLM Calls, Not Ten

Most multi-hop RAG approaches are iterative. IRCoT calls the LLM 3-5 times per query. Agentic RAG systems can make 10+ LLM calls.

Vector Graph RAG front-loads the discovery work into vector search and subgraph expansion. The LLM only gets called twice: once for reranking, once for generation. This cuts API costs by roughly 60% and makes the system 2-3x faster compared to iterative approaches.

Benchmark Results

Evaluated on three standard multi-hop QA benchmarks using Recall@5:

Dataset Naive RAG Vector Graph RAG
MuSiQue (2-4 hop) 65.2% 82.4%
HotpotQA (2 hop) 78.6% 91.2%
2WikiMultiHopQA (2 hop) 76.4% 89.8%
Average 73.4% 87.8%

Recall@5 vs Naive RAG

Against SOTA methods, Vector Graph RAG achieves the highest average Recall@5 at 87.8%, beating HippoRAG 2 on average — while using only 2 LLM calls per query and requiring no graph database.

SOTA comparison

Getting Started

pip install vector-graph-rag
Enter fullscreen mode Exit fullscreen mode
from vector_graph_rag import VectorGraphRAG

# Initialize - uses Milvus Lite (local .db file) by default
rag = VectorGraphRAG()

# Index your documents
rag.add_texts([
    "Metformin is the first-line medication for type 2 diabetes.",
    "Metformin requires regular monitoring of renal function.",
    "Type 2 diabetes affects insulin sensitivity in the body.",
])

# Query with multi-hop reasoning
result = rag.query(
    "What monitoring is needed for the first-line type 2 diabetes drug?"
)
print(result)
Enter fullscreen mode Exit fullscreen mode

By default, it uses Milvus Lite with a local .db file — no server needed. For production, switch to Milvus standalone/cluster or Zilliz Cloud.

Interactive frontend demo

Wrapping Up

Vector Graph RAG shows that the "graph" in Graph RAG doesn't have to mean a graph database. Store the graph structure as cross-referenced collections in a vector database and you get the same reasoning power with half the infrastructure.

If your RAG system struggles with multi-hop questions, give Vector Graph RAG a try. It's open source, installs in one command, and runs locally out of the box.

GitHub logo zilliztech / vector-graph-rag

Graph RAG with pure vector search, achieving SOTA performance in multi-hop reasoning scenarios.


Vector Graph RAG

Graph RAG with pure vector search — no graph database needed.

PyPI Python License Docs Stars Discord

💡 Encode entities and relations as vectors in Milvus, replace iterative LLM agents with a single reranking pass — achieve state-of-the-art multi-hop retrieval at a fraction of the operational and computational cost.

Vector Graph RAG Demo

✨ Features

  • No Graph Database Required — Pure vector search with Milvus, no Neo4j or other graph databases needed
  • Single-Pass LLM Reranking — One LLM call to rerank, no iterative agent loops (unlike IRCoT or multi-step reflection)
  • Knowledge-Intensive Friendly — Optimized for domains with dense factual content: legal, finance, medical, literature, etc.
  • Zero Configuration — Uses Milvus Lite by default, works out of the box with a single file
  • Multi-hop Reasoning — Subgraph expansion enables complex multi-hop question answering
  • State-of-the-Art Performance — 87.8% avg Recall@5 on multi-hop QA benchmarks, outperforming HippoRAG

📦 Installation

pip install vector-graph-rag
# or
uv add vector-graph-rag
Enter fullscreen mode Exit fullscreen mode
With document

Top comments (0)