DEV Community

Cover image for Stop Drowning in Vectors: How I Built a Graph-Powered RAG That Actually Scales
Tejas
Tejas

Posted on

Stop Drowning in Vectors: How I Built a Graph-Powered RAG That Actually Scales

The Problem with Traditional RAG

Let's be honest - vector-based RAG has a scaling problem. You chunk documents, embed everything, store it in a vector database, and hope semantic similarity finds the right context. But when you're dealing with:

  • Hundreds of technical documents
  • Cross-referenced content (citations, related sections)
  • Hierarchical information (chapters → sections → subsections)

Vector search starts to feel like finding a needle in a haystack of needles. You either blow up your context window or miss critical relationships between documents.


Enter Vectorless RAG

The PageIndex architecture introduced a brilliant alternative: parse documents into hierarchical JSON trees and let the LLM navigate the structure directly. No embeddings. No similarity search. Just pure structural reasoning.

But the original approach had a limitation - it kept everything in memory. Try loading hundreds of document trees simultaneously, and you'll watch your RAM wave goodbye.


Why I Put It on Neo4j

I took the vectorless RAG concept and gave it a persistent backbone: Neo4j Graph Database. Here's what changed:

1. Persistent Memory at Scale

Instead of loading JSON trees into memory, the hierarchical structure lives in Neo4j. Now you can query millions of documents without breaking a sweat. The LLM starts at document roots and walks down only the branches it needs.

2. Cross-Document Relationships

This is where it gets powerful. Want to link a citation in Document A to its source in Document B? Just create a [:REFERENCES] edge. The LLM can traverse these relationships during retrieval, giving you a true reasoning knowledge graph — something vector search simply cannot replicate.

3. Token-Efficient Retrieval

The LLM acts as an agentic navigator. Given a query, it:

  1. Identifies relevant root sections
  2. Queries Neo4j for children nodes
  3. Drills down iteratively until it finds the answer
  4. Ignores irrelevant branches entirely

This saves massive amounts of context tokens compared to dumping entire documents into the prompt.


How It Works

The Graph Structure

Every document becomes a tree in Neo4j:

(Document)-[:HAS_SECTION]->(Chapter)-[:HAS_SUBSECTION]->(Section)-[:HAS_SUBSECTION]->(Subsection)
Enter fullscreen mode Exit fullscreen mode

The central Document node connects to top-level chapters, which recursively connect to sub-sections. Each node can store summaries, page references, and metadata.

Three-Step Workflow

Step 1: Parse Documents

uv run python main.py --pdf_path /path/to/document.pdf
Enter fullscreen mode Exit fullscreen mode

This generates a _structure.json file containing the hierarchical tree. Markdown is supported too:

uv run python main.py --md_path /path/to/document.md
Enter fullscreen mode Exit fullscreen mode

Step 2: Ingest into Neo4j

uv run python -m src.database.ingest --json_path ./results/document_structure.json
Enter fullscreen mode Exit fullscreen mode

The JSON tree becomes nodes and relationships in your graph database.

Step 3: Query with Agentic Retrieval

uv run python -m src.agent.retriever \
  --doc_name document_structure.json \
  --query "What are the key findings in section 2?"
Enter fullscreen mode Exit fullscreen mode

The LLM identifies relevant sections, traverses the graph, and returns precise answers with page references.


Under the Hood

Tech Stack

  • Neo4j — Graph database for persistent storage
  • LiteLLM — Unified LLM interface (defaults to llama-3.3-70b via Groq)
  • PyMuPDF / PyPDF2 — PDF parsing
  • uv — Lightning-fast Python package management

The Agentic Retrieval Loop

The retriever (src/agent/retriever.py) implements a multi-step reasoning process:

  1. Root Analysis — LLM examines top-level sections to identify candidates
  2. Iterative Drilling — For each candidate, fetch children from Neo4j
  3. Relevance Filtering — LLM decides which branches to explore further
  4. Answer Extraction — Once leaf nodes are reached, extract the answer

This is fundamentally different from vector search. Instead of "find similar chunks," it's "navigate to the right place."


When to Use This Approach

Scenario Vector RAG Graph-Powered Vectorless RAG
Simple Q&A over single doc
Cross-document reasoning
Hierarchical content (manuals, specs) ⚠️
Citation/reference tracking
Token-efficient retrieval ⚠️
Massive document collections

What's Next

This is just the foundation. Here's where I'm taking it:

  • Multi-hop reasoning across document collections
  • Dynamic reference extraction to auto-build [:REFERENCES] edges
  • Hybrid search combining graph traversal with optional vector fallback
  • Streaming responses for real-time navigation feedback

Final Thoughts

Vector embeddings aren't going away, but they're not the only tool in the box. For structured, hierarchical, or cross-referenced content, graph-powered vectorless RAG gives you:

  • Scalability — millions of documents, zero memory issues
  • Reasoning — traverse relationships, not just similarities
  • Efficiency — precise context retrieval, minimal token waste

Sometimes the best way forward is to go back to structure.


🔗 Project Repository: https://github.com/TejasS1233/vectorless_RAG

💬 Questions or ideas? Drop them in the comments - I'd love to hear how you're approaching RAG at scale.


If you found this useful, follow me for more deep dives into practical AI architecture. Next up: building multi-hop reasoning across document graphs.

Top comments (0)