Nathaniel Hamlett

Posted on Mar 12 • Originally published at nathanhamlett.com

Why Your RAG System Needs a Graph Database (Not Just Vectors)

#ai #database #graphdatabase #rag

Why Your RAG System Needs a Graph Database (Not Just Vectors)

The standard RAG stack is: embed your documents, store them in a vector database, retrieve the top-k most similar chunks at query time, feed them to an LLM. This works surprisingly well for a lot of use cases.

It also fails completely for an entire class of questions that matter in specialized domains.

I built a domain-specific AI system with 3 million vector embeddings and 252,000 graph nodes. The graph layer catches questions that vector search can't answer — not because vector search is bad, but because similarity and connectivity are fundamentally different operations.

The Question That Breaks Vector Search

Here's a simple question for my domain (live music history): "What songs did the band never play in New York?"

Think about what vector search does with this query. It converts the question to an embedding and finds the most similar document chunks. The most similar chunks will be about songs played in New York — because those documents mention both "songs" and "New York" together. The answer requires the absence of a relationship, and vector search can only find what's present.

With a graph database:

MATCH (s:Song)
WHERE NOT EXISTS {
    MATCH (s)<-[:PERFORMED]-(show:Show)-[:AT]->(v:Venue)
    WHERE v.city = 'New York'
}
RETURN s.name

This query traverses the graph structure: find all songs, check which ones have no performance-at-venue relationship where the venue is in New York. It's a structural query — it operates on the topology of relationships, not on the semantic content of text.

Five Query Types Vectors Miss

After running both systems in parallel for months, I've cataloged the query patterns where graph traversal outperforms vector retrieval:

1. Negation and Absence Queries

"What venues did they never visit?" "Which personnel never overlapped?" "What year had no shows in California?"

Vector search has no concept of absence. It retrieves based on presence — documents that contain the relevant terms. Asking it about what doesn't exist returns semantically similar documents that contain the terms, which is the opposite of what you want.

2. Multi-Hop Relationship Queries

"Who played with both Jerry Garcia and Phil Lesh in side projects?"

This requires traversing: Person → played_in → Band → also_contains → Person, then intersecting the results. A graph handles this as a path query. Vector search would need to retrieve documents about Jerry Garcia's side projects, documents about Phil Lesh's side projects, and then hope the LLM can synthesize the intersection — which it usually gets wrong because the relevant information spans dozens of document chunks.

3. Aggregation Over Relationships

"What's the most common song to open the second set?" "Which venue hosted the most shows?"

These are COUNT + GROUP BY operations over relationship patterns. They're trivial in a graph (or SQL). They're nearly impossible with vector retrieval because you'd need to retrieve and count across thousands of documents.

4. Temporal Sequence Queries

"What did they play after Dark Star on 2/13/70?" "Show me the setlist evolution of this song over the 1977 tour."

Temporal sequences are ordered relationships. The graph stores: Song A → FOLLOWED_BY → Song B within a specific Show. Vector search has no concept of ordering — it treats each chunk independently.

5. Comparative Structural Queries

"How did the setlist composition differ between East Coast and West Coast shows in 1973?"

This requires comparing aggregate patterns across two subsets of the graph. It's a structural comparison, not a semantic one.

The Architecture That Handles Both

My system routes queries to the right retrieval strategy before execution:

Query → Intent Classification → Router
                                  ├── Vector Search (opinion, review, analysis queries)
                                  ├── Graph Traversal (structural, relationship, absence queries)
                                  ├── SQL (statistical, count, date-specific queries)
                                  └── Hybrid (complex queries needing multiple strategies)

The router is surprisingly simple — keyword classification plus a lightweight intent model. Structural keywords (relationship terms, negation, comparison, temporal markers) route to the graph. Opinion keywords (best, favorite, review, analysis) route to vectors. Statistical keywords (how many, average, count, most) route to SQL.

For hybrid queries, the system runs both retrieval strategies in parallel and merges the results before LLM synthesis. This handles cases where a question is partly structural and partly semantic: "What's the best recording of a Dark Star > Other One sequence?" needs the graph to find shows with that sequence, and vectors to find quality reviews of those shows.

Building the Knowledge Graph

The graph layer isn't magic — it requires explicit modeling of your domain's relationships. Here's what mine looks like:

Node types:

Show (date, venue, notes)
Song (title, first_played, last_played)
Venue (name, city, state, capacity)
Person (name, role, instruments)
Recording (source, format, quality)

Relationship types:

PERFORMED (Show → Song, with set_position, set_number)
AT (Show → Venue)
PLAYED_BY (Show → Person)
CAPTURED_BY (Show → Recording)
FOLLOWED_BY (Song → Song, within a Show)

252,000 nodes. Each one verified against source data. The graph construction was the hardest part of the entire project — not the AI, not the embeddings, not the retrieval logic. Getting the relationships right and keeping them consistent is where the real work lives.

Practical Graph Construction

from falkordb import FalkorDB

db = FalkorDB()
graph = db.select_graph('music_knowledge')

# Create nodes
graph.query("""
    CREATE (:Show {date: '1977-05-08', notes: 'Cornell University'})
""")

# Create relationships
graph.query("""
    MATCH (show:Show {date: '1977-05-08'})
    MATCH (venue:Venue {name: 'Barton Hall'})
    CREATE (show)-[:AT]->(venue)
""")

# Query across relationships
result = graph.query("""
    MATCH (show:Show)-[:AT]->(v:Venue {city: 'Ithaca'})
    MATCH (show)-[:PERFORMED]->(s:Song)
    RETURN show.date, collect(s.name) as songs
    ORDER BY show.date
""")

If you're using Neo4j, the syntax is identical (Cypher). FalkorDB is lighter weight and Redis-compatible, which I preferred for a single-machine deployment.

When You Don't Need a Graph

Not every RAG system needs a graph layer. You probably don't need one if:

Your queries are purely semantic. "Find documents about topic X" is what vectors are built for.
Your domain doesn't have meaningful relationships. A collection of blog posts doesn't have the structural relationships that benefit from graph traversal.
Approximate answers are fine. Vector search gives you "roughly similar" results. If that's sufficient, the engineering cost of a graph layer isn't justified.
Your data is small enough to fit in context. If your entire knowledge base fits in a 200K context window, you don't need any retrieval system — just stuff it all in the prompt.

You likely need a graph when:

Users ask structural questions about relationships, sequences, absences, and aggregations.
Your domain has entity relationships that matter (people → organizations, products → components, events → participants).
Accuracy matters more than speed. Graph queries are slower than vector search but dramatically more precise for structural questions.
You're building a specialist system. General-purpose chatbots don't need graphs. Domain-specific systems that serve expert users usually do.

The Compounding Effect

The most surprising thing about adding a graph layer wasn't the immediate query improvement — it was the compounding effect on LLM output quality.

When the LLM receives structurally correct retrieved data (exact relationships, verified connections, complete sets rather than sampled fragments), its synthesis quality jumps. It hallucinates less because there's less ambiguity in the input. It makes fewer factual errors because the facts are explicit, not inferred from semantic similarity.

The graph doesn't just answer different questions — it makes the vector-powered answers better too, because the system can cross-validate. If vector search retrieves a document claiming X, the graph can verify whether X is structurally true. This creates a self-checking retrieval system that's more reliable than either layer alone.

Start Simple

If you're convinced you need a graph layer, start with:

Model your entities and relationships on paper first. What are the node types? What are the relationship types? What questions do you need to answer?
Use FalkorDB or Neo4j. Both support Cypher queries. FalkorDB is lighter. Neo4j has better tooling and community.
Build the graph alongside your vector index, not instead of it. They're complementary. Most queries will still go to vectors. The graph handles the queries that vectors can't.
Route at the query level. Don't try to merge graph and vector results for every query. Classify the query intent first, then route to the appropriate retrieval strategy.

The investment is real — graph construction takes longer than vector embedding. But for any domain with meaningful relationships between entities, the graph layer transforms your system from "semantically approximate" to "structurally precise." That's the difference between a demo and a tool that domain experts actually trust.

Nathan Hamlett builds domain-specific AI systems. His current project combines 3M+ vector embeddings with a 252K-node knowledge graph for multi-strategy retrieval. More at nathanhamlett.com.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.