Tejas

Posted on Mar 31

Stop Drowning in Vectors: How I Built a Graph-Powered RAG That Actually Scales

#ai #rag #python #neo4j

The Problem with Traditional RAG

Let's be honest - vector-based RAG has a scaling problem. You chunk documents, embed everything, store it in a vector database, and hope semantic similarity finds the right context. But when you're dealing with:

Hundreds of technical documents
Cross-referenced content (citations, related sections)
Hierarchical information (chapters → sections → subsections)

Vector search starts to feel like finding a needle in a haystack of needles. You either blow up your context window or miss critical relationships between documents.

Enter Vectorless RAG

The PageIndex architecture introduced a brilliant alternative: parse documents into hierarchical JSON trees and let the LLM navigate the structure directly. No embeddings. No similarity search. Just pure structural reasoning.

But the original approach had a limitation - it kept everything in memory. Try loading hundreds of document trees simultaneously, and you'll watch your RAM wave goodbye.

Why I Put It on Neo4j

I took the vectorless RAG concept and gave it a persistent backbone: Neo4j Graph Database. Here's what changed:

1. Persistent Memory at Scale

Instead of loading JSON trees into memory, the hierarchical structure lives in Neo4j. Now you can query millions of documents without breaking a sweat. The LLM starts at document roots and walks down only the branches it needs.

2. Cross-Document Relationships

This is where it gets powerful. Want to link a citation in Document A to its source in Document B? Just create a [:REFERENCES] edge. The LLM can traverse these relationships during retrieval, giving you a true reasoning knowledge graph — something vector search simply cannot replicate.

3. Token-Efficient Retrieval

The LLM acts as an agentic navigator. Given a query, it:

Identifies relevant root sections
Queries Neo4j for children nodes
Drills down iteratively until it finds the answer
Ignores irrelevant branches entirely

This saves massive amounts of context tokens compared to dumping entire documents into the prompt.

How It Works

The Graph Structure

Every document becomes a tree in Neo4j:

(Document)-[:HAS_SECTION]->(Chapter)-[:HAS_SUBSECTION]->(Section)-[:HAS_SUBSECTION]->(Subsection)

The central Document node connects to top-level chapters, which recursively connect to sub-sections. Each node can store summaries, page references, and metadata.

Three-Step Workflow

Step 1: Parse Documents

uv run python main.py --pdf_path /path/to/document.pdf

This generates a _structure.json file containing the hierarchical tree. Markdown is supported too:

uv run python main.py --md_path /path/to/document.md

Step 2: Ingest into Neo4j

uv run python -m src.database.ingest --json_path ./results/document_structure.json

The JSON tree becomes nodes and relationships in your graph database.

Step 3: Query with Agentic Retrieval

uv run python -m src.agent.retriever \
  --doc_name document_structure.json \
  --query "What are the key findings in section 2?"

The LLM identifies relevant sections, traverses the graph, and returns precise answers with page references.

Under the Hood

Tech Stack

Neo4j — Graph database for persistent storage
LiteLLM — Unified LLM interface (defaults to llama-3.3-70b via Groq)
PyMuPDF / PyPDF2 — PDF parsing
uv — Lightning-fast Python package management

The Agentic Retrieval Loop

The retriever (src/agent/retriever.py) implements a multi-step reasoning process:

Root Analysis — LLM examines top-level sections to identify candidates
Iterative Drilling — For each candidate, fetch children from Neo4j
Relevance Filtering — LLM decides which branches to explore further
Answer Extraction — Once leaf nodes are reached, extract the answer

This is fundamentally different from vector search. Instead of "find similar chunks," it's "navigate to the right place."

When to Use This Approach

Scenario	Vector RAG	Graph-Powered Vectorless RAG
Simple Q&A over single doc	✅	✅
Cross-document reasoning	❌	✅
Hierarchical content (manuals, specs)	⚠️	✅
Citation/reference tracking	❌	✅
Token-efficient retrieval	⚠️	✅
Massive document collections	❌	✅

What's Next

This is just the foundation. Here's where I'm taking it:

Multi-hop reasoning across document collections
Dynamic reference extraction to auto-build [:REFERENCES] edges
Hybrid search combining graph traversal with optional vector fallback
Streaming responses for real-time navigation feedback

Final Thoughts

Vector embeddings aren't going away, but they're not the only tool in the box. For structured, hierarchical, or cross-referenced content, graph-powered vectorless RAG gives you:

Scalability — millions of documents, zero memory issues
Reasoning — traverse relationships, not just similarities
Efficiency — precise context retrieval, minimal token waste

Sometimes the best way forward is to go back to structure.

🔗 Project Repository: https://github.com/TejasS1233/vectorless_RAG

💬 Questions or ideas? Drop them in the comments - I'd love to hear how you're approaching RAG at scale.

If you found this useful, follow me for more deep dives into practical AI architecture. Next up: building multi-hop reasoning across document graphs.

DEV Community