Dataset	Naive RAG	Vector Graph RAG
MuSiQue (2-4 hop)	65.2%	82.4%
HotpotQA (2 hop)	78.6%	91.2%
2WikiMultiHopQA (2 hop)	76.4%	89.8%
Average	73.4%	87.8%

Vector Graph RAG

Chen Zhang · 2026-04-03T07:54:52Z

Standard RAG falls apart when the answer isn't in one chunk. Ask "What side effects should I watch for with the first-line diabetes medication?" and the system needs to first figure out that metformin is the first-line drug, then look up metformin's side effects. The query never mentions "metformin" — it's a bridge entity the system has to discover on its own. Naive vector search can't do this. The industry answer has been knowledge graphs plus graph databases. That works, but it means deploying Neo4j or similar, learning a graph query language, and operating two separate storage systems. The complexity doubles for what's essentially one feature: following entity chains across passages. I built Vector Graph RAG to get multi-hop reasoning without any of that overhead. The entire graph structure lives inside Milvus — entities, relations, and passages stored as three collections with ID cross-references. No graph database, no Cypher queries, just vector search and metadata lookups. Building a Logical Graph in Milvus The key insight is simple: a knowledge graph relation like (metformin, is_first_line_drug_for, type_2_diabetes) is just text. Text can be embedded into vectors. So why not store the entire graph structure in a vector database? Vector Graph RAG uses three Milvus collections with ID cross-references: Entities : Deduplicated entity names, embedded for semantic search. Each entity record stores the IDs of relations it participates in. Relations : Triple-based relations (subject, predicate, object). Each record stores the subject and object entity IDs, plus the IDs of source passages. The relation text is embedded for vector search. Passages : Original document chunks. Each record stores the IDs of entities and relations extracted from it. These three collections form a logical graph through ID references. "Graph traversal" becomes a series of ID-based metadata queries in Milvus — no graph query language needed. The extra ID lookups add maybe 2-3 primary key queries per hop. Each takes under 10ms. The real bottleneck in any RAG pipeline is the LLM call (1-3 seconds), so a few extra milliseconds of metadata lookup is invisible. The Four-Step Retrieval Pipeline Step 1: Seed Retrieval An LLM extracts key entities from the user query. These entities are embedded and used to search the Entities and Relations collections. The results are the "seeds" — entry points into the logical graph. Step 2: Subgraph Expansion This is where multi-hop happens. From each seed entity, the system follows ID references one hop outward: find the entity's relation IDs, fetch those relations, then fetch the entities on the other end of those relations. In the diabetes example, expanding from "type 2 diabetes" discovers the relation (metformin, is_first_line_drug_for, type_2_diabetes) , which surfaces "metformin" — the bridge entity the original query never mentioned. From "metformin," another expansion finds relations about renal function monitoring and side effects. Step 3: LLM Reranking After expansion, we have a pool of candidate relations and passages. A single LLM call scores and filters them for relevance to the original query. This replaces what iterative approaches do with multiple rounds of LLM-guided search. Step 4: Answer Generation The top-ranked relations and their associated passages go to the LLM for final answer generation. Two LLM Calls, Not Ten Most multi-hop RAG approaches are iterative. IRCoT calls the LLM 3-5 times per query. Agentic RAG systems can make 10+ LLM calls. Vector Graph RAG front-loads the discovery work into vector search and subgraph expansion. The LLM only gets called twice: once for reranking, once for generation. This cuts API costs by roughly 60% and makes the system 2-3x faster compared to iterative approaches. Benchmark Results Evaluated on three standard multi-hop QA benchmarks using Recall@5: Dataset Naive RAG Vector Graph RAG MuSiQue (2-4 hop) 65.2% 82.4% HotpotQA (2 hop) 78.6% 91.2% 2WikiMultiHopQA (2 hop) 76.4% 89.8% Average 73.4% 87.8% Against SOTA methods, Vector Graph RAG achieves the highest average Recall@5 at 87.8%, beating HippoRAG 2 on average — while using only 2 LLM calls per query and requiring no graph database. Getting Started pip install vector-graph-rag Enter fullscreen mode Exit fullscreen mode from vector_graph_rag import VectorGraphRAG # Initialize - uses Milvus Lite (local .db file) by default rag = VectorGraphRAG () # Index your documents rag . add_texts ([ " Metformin is the first-line medication for type 2 diabetes. " , " Metformin requires regular monitoring of renal function. " , " Type 2 diabetes affects insulin sensitivity in the body. " , ]) # Query with multi-hop reasoning result = rag . query ( " What monitoring is needed for the first-line type 2 diabetes drug? " ) print ( result ) Enter fullscreen mode Exit fullscreen mode By default, it uses Milvus Lite with a local .db file — no server needed. For production, switch to Milvus standalone/cluster or Zilliz Cloud . Wrapping Up Vector Graph RAG shows that the "graph" in Graph RAG doesn't have to mean a graph database. Store the graph structure as cross-referenced collections in a vector database and you get the same reasoning power with half the infrastructure. If your RAG system struggles with multi-hop questions, give Vector Graph RAG a try. It's open source, installs in one command, and runs locally out of the box. zilliztech / vector-graph-rag Graph RAG with pure vector search, achieving SOTA performance in multi-hop reasoning scenarios. <a target="_blank" rel="noopener noreferrer" href="https://private-user-images.githubusercontent.com/17022025/569541915-60afcee1-049a-4d2c-845d-8953b4fae083.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzUyMDMxOTIsIm5iZiI6MTc3NTIwMjg5MiwicGF0aCI6Ii8xNzAyMjAyNS81Njk1NDE5MTUtNjBhZmNlZTEtMDQ5YS00ZDJjLTg0NWQtODk1M2I0ZmFlMDgzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjA0MDMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwNDAzVDA3NTQ1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTZkMmEyYjkwY2M2NzljNGM2MTQwMDA4MTU3ODMyMDlkOGM1M2I1Mjk0NTFlYjFiMWFlMTBkMDc3ODA3MmFlYzcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.B5r-l_3_j_twMTQ9GZjxARVM1v-44vxe3bizunmGYWU"><img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fprivate-user-images.githubusercontent.com%2F17022025%2F569541915-60afcee1-049a-4d2c-845d-8953b4fae083.png%3Fjwt%3DeyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzUyMDMxOTIsIm5iZiI6MTc3NTIwMjg5MiwicGF0aCI6Ii8xNzAyMjAyNS81Njk1NDE5MTUtNjBhZmNlZTEtMDQ5YS00ZDJjLTg0NWQtODk1M2I0ZmFlMDgzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjA0MDMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwNDAzVDA3NTQ1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTZkMmEyYjkwY2M2NzljNGM2MTQwMDA4MTU3ODMyMDlkOGM1M2I1Mjk0NTFlYjFiMWFlMTBkMDc3ODA3MmFlYzcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.B5r-l_3_j_twMTQ9GZjxARVM1v-44vxe3bizunmGYWU" alt="" width="120" valign="middle" style="max-width: 100%;" loading="lazy"> Vector Graph RAG Graph RAG with pure vector search — no graph database needed. <img src="https://camo.githubusercontent.com/63a7563e96ce73fab13ca542e9ff27338cbeb85f4a8cf946541591b432269b4d/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f707974686f6e2d253345253344332e31302d626c75653f7374796c653d666c61742d737175617265266c6f676f3d707974686f6e266c6f676f436f6c6f723d7768697465" alt="Python" data-canonical-src="https://img.shields.io/badge/python-%3E%3D3.10-blue?style=flat-square&logo=python&logoColor=white" style="max-width: 100%;"> <img src="https://camo.githubusercontent.com/181d5f02ad0ad71e99d20695a1882ecba6ce7fd99fa83a90f4ddd99e27e93b4d/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f6c6963656e73652f7a696c6c697a746563682f766563746f722d67726170682d7261673f7374796c653d666c61742d737175617265" alt="License" data-canonical-src="https://img.shields.io/github/license/zilliztech/vector-graph-rag?style=flat-square" style="max-width: 100%;"> <img src="https://camo.githubusercontent.com/b8431c89a07fffba654e2fa5b709bff0fcd6e2dbe9fb53e5c88e3153d14313e0/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f73746172732f7a696c6c697a746563682f766563746f722d67726170682d7261673f7374796c653d666c61742d737175617265" alt="Stars" data-canonical-src="https://img.shields.io/github/stars/zilliztech/vector-graph-rag?style=flat-square" style="max-width: 100%;"> <img src="https://camo.githubusercontent.com/83be151409ef6198d98bba837d8fa015ae9b9f6f756f7b4b17f5e17e21edbed6/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f446973636f72642d636861742d3732383964613f7374796c653d666c61742d737175617265266c6f676f3d646973636f7264266c6f676f436f6c6f723d7768697465" alt="Discord" data-canonical-src="https://img.shields.io/badge/Discord-chat-7289da?style=flat-square&logo=discord&logoColor=white" style="max-width: 100%;"> 💡 Encode entities and relations as vectors in Milvus , replace iterative LLM agents with a single reranking pass — achieve state-of-the-art multi-hop retrieval at a fraction of the operational and computational cost. <a target="_blank" rel="noopener noreferrer" href="https://private-user-images.githubusercontent.com/17022025/569496071-1185b651-ed72-4408-9dcd-25a74b12835b.gif?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzUyMDMxOTIsIm5iZiI6MTc3NTIwMjg5MiwicGF0aCI6Ii8xNzAyMjAyNS81Njk0OTYwNzEtMTE4NWI2NTEtZWQ3Mi00NDA4LTlkY2QtMjVhNzRiMTI4MzViLmdpZj9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjA0MDMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwNDAzVDA3NTQ1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTMzYWU4YzZlY2UxMjEwYjI1NGVjZjU5M2JjYTg3YTMyZjdjNzQxYzA5M2RjMTFhOWEyZDcyMDQyMGZhYWNiY2ImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.1TYQb2qiNNLGriuuatmPvsLrrykF-uDiwucWBBMNAMo"><img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fprivate-user-images.githubusercontent.com%2F17022025%2F569496071-1185b651-ed72-4408-9dcd-25a74b12835b.gif%3Fjwt%3DeyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzUyMDMxOTIsIm5iZiI6MTc3NTIwMjg5MiwicGF0aCI6Ii8xNzAyMjAyNS81Njk0OTYwNzEtMTE4NWI2NTEtZWQ3Mi00NDA4LTlkY2QtMjVhNzRiMTI4MzViLmdpZj9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjA0MDMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwNDAzVDA3NTQ1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTMzYWU4YzZlY2UxMjEwYjI1NGVjZjU5M2JjYTg3YTMyZjdjNzQxYzA5M2RjMTFhOWEyZDcyMDQyMGZhYWNiY2ImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.1TYQb2qiNNLGriuuatmPvsLrrykF-uDiwucWBBMNAMo" alt="Vector Graph RAG Demo" width="800" data-animated-image="" style="max-width: 100%;" loading="lazy"> ✨ Features No Graph Database Required — Pure vector search with Milvus, no Neo4j or other graph databases needed Single-Pass LLM Reranking — One LLM call to rerank, no iterative agent loops (unlike IRCoT or multi-step reflection) Knowledge-Intensive Friendly — Optimized for domains with dense factual content: legal, finance, medical, literature, etc. Zero Configuration — Uses Milvus Lite by default, works out of the box with a single file Multi-hop Reasoning — Subgraph expansion enables complex multi-hop question answering State-of-the-Art Performance — 87.8% avg Recall@5 on multi-hop QA benchmarks, outperforming HippoRAG 📦 Installation pip install vector-graph-rag # or uv add vector-graph-rag Enter fullscreen mode Exit fullscreen mode With document … View on GitHub

Graph RAG with pure vector search — no graph database needed.

💡 Encode entities and relations as vectors in Milvus, replace iterative LLM agents with a single reranking pass — achieve state-of-the-art multi-hop retrieval at a fraction of the operational and computational cost.

✨ Features

No Graph Database Required — Pure vector search with Milvus, no Neo4j or other graph databases needed
Single-Pass LLM Reranking — One LLM call to rerank, no iterative agent loops (unlike IRCoT or multi-step reflection)
Knowledge-Intensive Friendly — Optimized for domains with dense factual content: legal, finance, medical, literature, etc.
Zero Configuration — Uses Milvus Lite by default, works out of the box with a single file
Multi-hop Reasoning — Subgraph expansion enables complex multi-hop question answering
State-of-the-Art Performance — 87.8% avg Recall@5 on multi-hop QA benchmarks, outperforming HippoRAG

📦 Installation

pip install vector-graph-rag
# or
uv add vector-graph-rag

With document

…

DEV Community

Vector Graph RAG: Multi-Hop RAG Without a Graph Database

Building a Logical Graph in Milvus

The Four-Step Retrieval Pipeline

Step 1: Seed Retrieval

Step 2: Subgraph Expansion

Step 3: LLM Reranking

Step 4: Answer Generation

Two LLM Calls, Not Ten

Benchmark Results

Getting Started

Wrapping Up

zilliztech / vector-graph-rag

Graph RAG with pure vector search, achieving SOTA performance in multi-hop reasoning scenarios.

Vector Graph RAG

✨ Features

📦 Installation

Top comments (1)