DEV Community

Cover image for RAG vs GraphRAG: When Agents Hallucinate Answers
Elizabeth Fuentes L for AWS

Posted on • Originally published at builder.aws.com

RAG vs GraphRAG: When Agents Hallucinate Answers

Traditional RAG makes AI agents hallucinate statistics and aggregations. This demo builds a travel booking agent with Strands Agents and compares RAG (FAISS) vs GraphRAG (Neo4j) to measure which approach reduces hallucinations when answering queries about 300 hotel FAQ documents

When AI Agents Don't Just Answer Wrong—They Act Wrong

Comparison showing agent giving wrong statistics versus precise database results

In the previous blog post, we explored at a high level why AI agents hallucinate and introduced 4 essential techniques to stop them: Graph RAG, semantic tool selection, neurosymbolic guardrails, and multi agents validation. Now we're going to dive deeper into each one. This is Part 1: we'll build a travel booking agent, load 300 hotel FAQ documents, and measure exactly where traditional RAG breaks down and how GraphRAG with Neo4j eliminates those failures.

When AI Agents Don't Just Answer Wrong They Act Wrong

AI agents differ from chatbots. A chatbot giving incorrect information is annoying. An agent hallucinating during execution is catastrophic—it might fabricate API parameters, invent success confirmations after failures, or execute actions based on false beliefs.

Recent research (MetaRAG, 2025) proves you cannot eliminate hallucinations—they're inherent to how LLMs work. The focus shifted to detecting, containing, and mitigating them in production.


This Series: 4 Production Techniques

Part 1 (This Post): GraphRAG - Relationship-aware knowledge graphs preventing hallucinations in aggregations and precise queries

Part 2: Semantic Tool Selection - Vector-based tool filtering for accurate tool selection

Part 3: Neurosymbolic Guardrails - Symbolic reasoning for verifiable decisions

Part 4: Multi-Agent Validation - Agent teams detecting hallucinations before damage

Code uses Strands Agents.

Go to github repositorie: sample-why-agents-fail

git clone https://github.com/aws-samples/sample-why-agents-fail
Enter fullscreen mode Exit fullscreen mode

Part 1: When RAG Makes Agents Hallucinate

Traditional RAG retrieves similar documents using vector search. This works for semantic questions but fails when agents need precise information. Research (RAG-KG-IL, 2025) identifies three types of hallucinations this causes:

  1. Fabricated statistics — LLM generates plausible-sounding numbers from text chunks instead of computing them. The paper shows RAG-only systems produce 49 hallucinated statements vs 35 with knowledge graph integration — a 73% reduction compared to standalone LLMs.

  2. Incomplete retrieval — Vector search returns top-k similar documents, missing relevant data scattered across hundreds of documents. The paper found RAG-only missed information in nearly every question (54 instances), while KG-integrated systems had near-zero incompleteness.

  3. Out-of-domain fabrication — When no relevant data exists, RAG still returns similar-looking results and the LLM fabricates an answer. MetaRAG (2025) confirms this is inherent to how retrieval works: similarity search always returns something, even when nothing is relevant.


The Demo: Two Agents, Same Data, Different Approaches

The demo uses two separate agents querying the same 300 hotel FAQ documents:

Agent 1: Traditional RAG Agent

Uses FAISS vector similarity search as Strands Agents custom tool. Given a query, it finds the 3 most similar documents and lets the LLM summarize them.

from strands import Agent, tool

@tool
def search_faqs(query: str) -> str:
    """Search hotel FAQs using vector similarity (Traditional RAG)."""
    query_embedding = model.encode([query])
    distances, indices = index.search(query_embedding.astype('float32'), 3)
    results = []
    for idx in indices[0]:
        doc = documents[idx]
        results.append(f"[{doc['filename']}]\n{doc['text'][:500]}...")
    return "\n\n".join(results)

rag_agent = Agent(
    tools=[search_faqs],
    system_prompt="You are a travel agent. Use vector search to find relevant FAQ information.",
    model=OpenAIModel(model_id="gpt-4o-mini")
)
Enter fullscreen mode Exit fullscreen mode

Limitation: The agent only sees k documents at a time (for this example 3). It cannot aggregate, count, or traverse relationships across the full dataset.

Note on embeddings: This demo uses SentenceTransformers (all-MiniLM-L6-v2) for vector embeddings — it runs locally, requires no API keys, and costs nothing. You can swap it for any embedding model: Amazon Nova Embeddings, OpenAI text-embedding-3-small, Cohere Embed, etc.

Agent 2: Graph-RAG Agent

Uses a Neo4j knowledge graph built automatically with neo4j-graphrag (neo4j-graphrag-python). The LLM writes Cypher queries to get precise answers.

Go to github repositorie: sample-why-agents-fail/stop-ai-agent-hallucinations/01-faq-graphrag-demo

from strands import Agent, tool

@tool
def query_knowledge_graph(cypher_query: str) -> str:
    """Execute a Cypher query against the hotel knowledge graph.

    Node labels: Hotel, Room, Amenity, Policy, Service
    Hotel properties: name, address, guestRating, totalRooms, email, phone
    Room properties: name (e.g. "Standard Room"), price, maxOccupancy
    Amenity properties: name (e.g. "Outdoor Swimming Pool", "WiFi")
    Policy properties: name (e.g. "Check-in Policy"), details

    Relationships:
    - (Hotel)-[:HAS_ROOM]->(Room)
    - (Hotel)-[:OFFERS_AMENITY]->(Amenity)
    - (Hotel)-[:HAS_POLICY]->(Policy)
    - (Hotel)-[:PROVIDES_SERVICE]->(Service)

    Location is in Hotel.address property (e.g. "789 Corniche el-Nil, Cairo 11519").
    To find hotels by location, use: WHERE h.address CONTAINS 'Cairo'
    """
    driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD))
    with driver.session() as session:
        result = session.run(cypher_query)
        records = list(result)
        if not records:
            return "No results found."
        output = f"Found {len(records)} results:\n"
        for record in records[:15]:
            output += f"  {dict(record.items())}\n"
        return output

graph_agent = Agent(
    tools=[query_knowledge_graph],
    system_prompt="You are a travel agent. Use the knowledge base to answer questions accurately. You can run multiple queries to explore the data.",
    model=OpenAIModel(model_id="gpt-4o-mini")
)
Enter fullscreen mode Exit fullscreen mode

Key difference: The agent writes Cypher queries that execute native AVG(), COUNT(), and relationship traversals directly in the database.

How Text2Cypher Works

The GraphRAG agent doesn't have hardcoded queries. Instead, it uses the Text2Cypher pattern — the LLM translates natural language into Cypher based on the graph schema described in the tool's docstring:

  1. User asks: "How many hotels have a swimming pool?"
  2. LLM reads the tool description containing the schema (node labels, properties, relationships)
  3. LLM generates: MATCH (h:Hotel)-[:OFFERS_AMENITY]->(a:Amenity) WHERE a.name CONTAINS 'Pool' RETURN COUNT(DISTINCT h)
  4. Tool executes the query against Neo4j and returns the result

The schema in the docstring is what grounds the LLM — without it, the LLM would guess node names and relationships. With it, the LLM generates valid Cypher that matches the actual graph structure.

How the Knowledge Graph is Built

The graph is built automatically using neo4j-graphrag — no hardcoded schema. Research on automated knowledge graph construction (RAKG, 2025) shows LLMs can extract entities and relationships from unstructured text:

Go to github repositorie: sample-why-agents-fail/stop-ai-agent-hallucinations/01-faq-graphrag-demo

from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline

kg_builder = SimpleKGPipeline(
    llm=llm,
    driver=neo4j_driver,
    embedder=embedder,
    from_pdf=False,
    perform_entity_resolution=True,
)

# LLM discovers entities and relationships from each document
await kg_builder.run_async(text=document_text)
Enter fullscreen mode Exit fullscreen mode

The LLM reads each FAQ and automatically discovers entity types (Hotel, Room, Amenity, Policy) and relationships (HAS_ROOM, OFFERS_AMENITY, HAS_POLICY). No manual schema definition needed — if you add new documents with new entity types (Restaurant, Airport, etc.), the LLM discovers them automatically.


Results: 4 Tests Validating the Research

Both agents answer the same questions. We compare their responses against what the research papers predict:

Run: travel_agent_demo.py

Test 1: Aggregation — "What is the average guest rating across all hotels in Paris?"

Research (RAG-KG-IL, 2025) predicts RAG cannot compute aggregations from text chunks.

RAG vs GraphRAG Aggregation

  • RAG: Manually calculates from 2 docs it found → 4.7 (correct but only because it found 2)
  • GraphRAG: Native AVG(h.guestRating) in Cypher → 4.7 ✅ Database-level computation

Test 2: Precise Counting — "How many hotels have a swimming pool as an amenity?"

Research (MetaRAG, 2025) shows RAG retrieves top-k documents, making counting impossible.

RAG vs GraphRAG Precise Counting

  • RAG: ❌ "I don't have the data needed to answer" — cannot count across 300 docs (only sees 3)
  • Graph-RAG: ✅ "133 hotels" — exact count with Cypher COUNT()

Test 3: Multi hop Reasoning — "What are the room types and prices for the highest rated hotel?"

Research (RAG-KG-IL, 2025) shows RAG falls short when tasks require deeper inference across interconnected data.

RAG vs GraphRAG Multi hop Reasoning

  • RAG: ⚠️ Found one hotel but "does not include room types" — cannot traverse relationships
  • GraphRAG: Traversed Hotel → Room nodes via Cypher, found top-rated hotels and their room data

Test 4: Out of domain — "Tell me about hotels in Antarctica"

Research (MetaRAG, 2025) proves RAG hallucinates when data doesn't exist because vector search always returns similar results.

RAG vs GraphRAG Out of domain

  • RAG: ❌ HALLUCINATED — fabricated "Research Stations", "Expedition Cruises", "Specialized Lodges" that DO NOT exist in the data
  • Graph-RAG: ✅ "No hotels listed in Antarctica" — honest, does not fabricate

RAG always returns something. GraphRAG returns empty results when data doesn't exist.

rag vs GraphRAG accuracy

When to Use Graph-RAG

Use Graph-RAG:

  • Precise queries (numerical filtering, exact matches)
  • Aggregations (counts, averages, sums)
  • Relationships (multi-hop traversal)
  • Structured data (clear schemas)
  • Verifiable results

Use RAG:

  • Semantic search (similar concepts)
  • Unstructured text (documents, articles)
  • Fuzzy matching (approximate results)
  • Simple retrieval

Try It Yourself

git clone https://github.com/aws-samples/sample-why-agents-fail
cd stop-ai-agent-hallucinations/01-faq-graphrag-demo
uv venv && uv pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode
# Build FAISS vector index
uv run load_vector_data.py
Enter fullscreen mode Exit fullscreen mode
# Build Neo4j knowledge graph (requires Neo4j with APOC plugin)
uv run build_graph.py
Enter fullscreen mode Exit fullscreen mode
# Run comparison
uv run travel_agent_demo.py
Enter fullscreen mode Exit fullscreen mode

What's Next

GraphRAG prevents hallucinations in agent responses. But agents still hallucinate during tool selection choosing the wrong tool.

Part 2: Semantic Tool Selection shows how vector based tool filtering reduces tool selection errors when agents have dozens of similar tools.

Key Takeaways

  1. RAG makes agents hallucinate statistics: LLMs estimate instead of calculating
  2. GraphRAG provides precision: Native aggregations, exact filtering
  3. Explicit failure prevents hallucinations: Empty results vs similar matches
  4. Automatic graph construction: neo4j-graphrag discovers entities without hardcoded schemas
  5. Two-agent comparison: Same data, different tools — measurable difference
  6. Strands Agents makes this simple: A @tool decorator and a few lines of configuration is all it takes — define a tool, wire it to an agent, and you have a working RAG or GraphRAG system. Swapping between approaches means swapping one tool, not rewriting your agent.

Gracias!

🇻🇪🇨🇱 Dev.to Linkedin GitHub Twitter Instagram Youtube
Linktr

Top comments (0)