The Problem with Traditional RAG
Let's be honest - vector-based RAG has a scaling problem. You chunk documents, embed everything, store it in a vector database, and hope semantic similarity finds the right context. But when you're dealing with:
- Hundreds of technical documents
- Cross-referenced content (citations, related sections)
- Hierarchical information (chapters → sections → subsections)
Vector search starts to feel like finding a needle in a haystack of needles. You either blow up your context window or miss critical relationships between documents.
Enter Vectorless RAG
The PageIndex architecture introduced a brilliant alternative: parse documents into hierarchical JSON trees and let the LLM navigate the structure directly. No embeddings. No similarity search. Just pure structural reasoning.
But the original approach had a limitation - it kept everything in memory. Try loading hundreds of document trees simultaneously, and you'll watch your RAM wave goodbye.
Why I Put It on Neo4j
I took the vectorless RAG concept and gave it a persistent backbone: Neo4j Graph Database. Here's what changed:
1. Persistent Memory at Scale
Instead of loading JSON trees into memory, the hierarchical structure lives in Neo4j. Now you can query millions of documents without breaking a sweat. The LLM starts at document roots and walks down only the branches it needs.
2. Cross-Document Relationships
This is where it gets powerful. Want to link a citation in Document A to its source in Document B? Just create a [:REFERENCES] edge. The LLM can traverse these relationships during retrieval, giving you a true reasoning knowledge graph — something vector search simply cannot replicate.
3. Token-Efficient Retrieval
The LLM acts as an agentic navigator. Given a query, it:
- Identifies relevant root sections
- Queries Neo4j for children nodes
- Drills down iteratively until it finds the answer
- Ignores irrelevant branches entirely
This saves massive amounts of context tokens compared to dumping entire documents into the prompt.
How It Works
The Graph Structure
Every document becomes a tree in Neo4j:
(Document)-[:HAS_SECTION]->(Chapter)-[:HAS_SUBSECTION]->(Section)-[:HAS_SUBSECTION]->(Subsection)
The central Document node connects to top-level chapters, which recursively connect to sub-sections. Each node can store summaries, page references, and metadata.
Three-Step Workflow
Step 1: Parse Documents
uv run python main.py --pdf_path /path/to/document.pdf
This generates a _structure.json file containing the hierarchical tree. Markdown is supported too:
uv run python main.py --md_path /path/to/document.md
Step 2: Ingest into Neo4j
uv run python -m src.database.ingest --json_path ./results/document_structure.json
The JSON tree becomes nodes and relationships in your graph database.
Step 3: Query with Agentic Retrieval
uv run python -m src.agent.retriever \
--doc_name document_structure.json \
--query "What are the key findings in section 2?"
The LLM identifies relevant sections, traverses the graph, and returns precise answers with page references.
Under the Hood
Tech Stack
- Neo4j — Graph database for persistent storage
-
LiteLLM — Unified LLM interface (defaults to
llama-3.3-70bvia Groq) - PyMuPDF / PyPDF2 — PDF parsing
- uv — Lightning-fast Python package management
The Agentic Retrieval Loop
The retriever (src/agent/retriever.py) implements a multi-step reasoning process:
- Root Analysis — LLM examines top-level sections to identify candidates
- Iterative Drilling — For each candidate, fetch children from Neo4j
- Relevance Filtering — LLM decides which branches to explore further
- Answer Extraction — Once leaf nodes are reached, extract the answer
This is fundamentally different from vector search. Instead of "find similar chunks," it's "navigate to the right place."
When to Use This Approach
| Scenario | Vector RAG | Graph-Powered Vectorless RAG |
|---|---|---|
| Simple Q&A over single doc | ✅ | ✅ |
| Cross-document reasoning | ❌ | ✅ |
| Hierarchical content (manuals, specs) | ⚠️ | ✅ |
| Citation/reference tracking | ❌ | ✅ |
| Token-efficient retrieval | ⚠️ | ✅ |
| Massive document collections | ❌ | ✅ |
What's Next
This is just the foundation. Here's where I'm taking it:
- Multi-hop reasoning across document collections
-
Dynamic reference extraction to auto-build
[:REFERENCES]edges - Hybrid search combining graph traversal with optional vector fallback
- Streaming responses for real-time navigation feedback
Final Thoughts
Vector embeddings aren't going away, but they're not the only tool in the box. For structured, hierarchical, or cross-referenced content, graph-powered vectorless RAG gives you:
- Scalability — millions of documents, zero memory issues
- Reasoning — traverse relationships, not just similarities
- Efficiency — precise context retrieval, minimal token waste
Sometimes the best way forward is to go back to structure.
🔗 Project Repository: https://github.com/TejasS1233/vectorless_RAG
💬 Questions or ideas? Drop them in the comments - I'd love to hear how you're approaching RAG at scale.
If you found this useful, follow me for more deep dives into practical AI architecture. Next up: building multi-hop reasoning across document graphs.
Top comments (0)