DEV Community

Cover image for How to Stop AI Agents from Hallucinating Silently with Multi-Agent Validation
Elizabeth Fuentes L for AWS

Posted on

How to Stop AI Agents from Hallucinating Silently with Multi-Agent Validation

AI agents fail silently. They confirm operations that never completed, return success when tools returned errors, and fabricate responses with full confidence. A single agent has no mechanism to detect its own hallucinations — and no second opinion to catch them before they reach users.

Single-agent architectures have a fundamental blind spot: the agent that executes a task is the same one that reports the result. There's no cross-check, no validation layer, no audit trail. When the LLM misinterprets a tool error or substitutes a different result than what was requested, it does so silently — and the user receives a confident, wrong answer.

This is one of the most common failure patterns in AI systems today. Research (Teaming LLMs to Detect and Mitigate Hallucinations, 2024) identifies it as a structural problem, not a model quality problem: you can't prompt your way out of it. The solution is architectural.

Multi-agent validation introduces a separation of concerns: one agent executes, another verifies, a third approves. Each handoff is a checkpoint. Hallucinations that would pass undetected in a single-agent system get caught — before they reach users — with an explicit FAILED status instead of a silent wrong answer.

In this post we build an Executor → Validator → Critic swarm using Strands Agents, demonstrate the failure with a single agent, and show exactly where the multi-agent chain catches it.


Series Overview

This is Part 3 of a four-part series on stopping AI agent hallucinations:

Part 1: RAG vs GraphRAG: When Agents Hallucinate Answers - Relationship-aware knowledge graphs preventing hallucinations in aggregations and precise queries

Part 2: Reduce Agent Errors and Token Costs with Semantic Tool Selection - Vector-based tool filtering for accurate tool selection

Part 3: AI Agent Guardrails: Rules That LLMs Cannot Bypass - Symbolic reasoning for verifiable decisions

Bonus Part 3.2: Runtime Guardrails for AI Agents — Steer, Don't Block Open-source runtime controls that guide agents to self-correct violations instead of failing the workflow.

Part 4: Multi-Agent Validation - Agent teams detecting hallucinations before damage


The Problem: Single Agents Have No Self-Correction

Research (Teaming LLMs to Detect and Mitigate Hallucinations, 2024) identifies four failure patterns in single-agent systems:

  • Claim success when operations failed — No validation layer catches execution errors
  • Use wrong tools for requests — No cross-check verifies tool appropriateness
  • Fabricate responses — No second opinion challenges generated content
  • Provide inaccurate statistics — No verification against ground truth

The root cause is structural. A single agent operates in isolation: it calls a tool, receives a result, and decides what to tell the user — all in the same reasoning loop. If the tool returns an error and the LLM interprets it as recoverable, it will find a way to complete the task anyway. From the model's perspective, it solved the problem. From the user's perspective, it hallucinated.

User → Agent → Tools → Response
              ↑
     hallucination happens here, undetected
Enter fullscreen mode Exit fullscreen mode

There is no layer between tool execution and user response that can challenge the agent's interpretation. The only way to add one is architecturally.


The Solution: Executor → Validator → Critic

Three specialized agents, each with a single responsibility:

Agent Role What it checks
Executor Executes requests using tools Completes the task, reports exactly what the tools returned
Validator Reviews the execution Was the correct tool used? Does the response match what was requested?
Critic Final approval APPROVED or REJECTED with explicit reasoning

No agent trusts its own output. Every response passes through two independent checkpoints before reaching the user. Strands Agents handles the coordination through autonomous handoffs, shared context across the entire swarm, and explicit status tracking.

Executor Validator Critic multi-agent pipeline

Implementation

Step 1 — Define the Tools

Plain booking tools — no validation logic, no special handling for hallucinations:

from strands import tool

HOTELS = {
    "grand_hotel":   {"price": 200, "available": True,  "max_guests": 4},
    "budget_inn":    {"price": 80,  "available": True,  "max_guests": 2},
    "luxury_resort": {"price": 500, "available": False, "max_guests": 6},
}

BOOKINGS = {}

@tool
def search_hotels(location: str, guests: int = 1) -> str:
    """Search available hotels in a location."""
    available = [
        f"{k}: ${v['price']}/night, max {v['max_guests']} guests"
        for k, v in HOTELS.items()
        if v["available"] and v["max_guests"] >= guests
    ]
    return f"Hotels in {location}: {available}" if available else "No hotels available"

@tool
def book_hotel(hotel_id: str, guest_name: str, nights: int = 1) -> str:
    """Book a hotel room."""
    if hotel_id not in HOTELS:
        return f"ERROR: Hotel '{hotel_id}' not found"
    if not HOTELS[hotel_id]["available"]:
        return f"ERROR: {hotel_id} is not available"
    total = HOTELS[hotel_id]["price"] * nights
    booking_id = f"BK{len(BOOKINGS)+1:03d}"
    BOOKINGS[booking_id] = {
        "hotel": hotel_id, "guest": guest_name,
        "nights": nights, "total": total
    }
    return f"SUCCESS: Booking {booking_id} confirmed — {hotel_id}, {nights} nights, ${total}"

@tool
def get_booking(booking_id: str) -> str:
    """Get booking details."""
    if booking_id not in BOOKINGS:
        return f"ERROR: Booking '{booking_id}' not found"
    b = BOOKINGS[booking_id]
    return f"Booking {booking_id}: {b['hotel']} for {b['guest']}, {b['nights']} nights, ${b['total']}"
Enter fullscreen mode Exit fullscreen mode

The tools return explicit ERROR: messages. Whether the agent respects them is the entire question this demo answers.

Step 2 — Baseline: Single Agent

from strands import Agent
from strands.models.openai import OpenAIModel

# Using OpenAI-compatible interface via Strands SDK (not direct OpenAI usage)
MODEL = OpenAIModel(model_id="gpt-4o-mini")
# You can swap to any provider supported by Strands:
# https://strandsagents.com/latest/documentation/docs/user-guide/concepts/model-providers/

single_agent = Agent(
    name="single",
    system_prompt="You are a hotel booking assistant. Use tools to complete requests.",
    tools=[search_hotels, book_hotel, get_booking],
    model=MODEL
)
Enter fullscreen mode Exit fullscreen mode

Step 3 — Multi-Agent Swarm

from strands.multiagent import Swarm

executor = Agent(
    name="executor",
    system_prompt="""Execute booking requests using tools.
After EVERY action, call handoff_to_agent to pass to 'validator'.""",
    tools=[search_hotels, book_hotel, get_booking],
    model=MODEL
)

validator = Agent(
    name="validator",
    system_prompt="""Validate booking responses. Check:
- Was the correct tool used?
- Is the response accurate and consistent with what was requested?
Say VALID or HALLUCINATION with reasons.
Then call handoff_to_agent to pass to 'critic'.""",
    model=MODEL
)

critic = Agent(
    name="critic",
    system_prompt="""Final review. Say APPROVED or REJECTED with reasoning.
You are the last agent — do NOT hand off.""",
    model=MODEL
)

swarm = Swarm([executor, validator, critic], entry_point=executor, max_handoffs=5)
Enter fullscreen mode Exit fullscreen mode

Identical tools. Identical model. Identical task. The only structural difference is the validation chain.

Why Strands Makes This Simple

The entire coordination layer — autonomous handoffs between agents, shared conversation context, and explicit status tracking — is handled by Strands with a single call:

swarm = Swarm([executor, validator, critic], entry_point=executor, max_handoffs=5)
result = swarm("Book the_ritz_paris for Sarah for 3 nights")
print(result.status)  # COMPLETED or FAILED
Enter fullscreen mode Exit fullscreen mode

No message-passing code. No handoff logic to write. No loop detection. You define what each agent does via system_prompt; Strands handles how they coordinate — including the handoff_to_agent tool built into every agent automatically.

Strands Swarm Documentation


Results

[TEST 1] Single Agent — Valid Booking
✓ Response: I've booked the grand_hotel for John for 2 nights...

[TEST 2] Single Agent — Invalid Hotel (non-existent hotel)
⚠️  Response: I've booked the grand_hotel in Paris for Sarah...
    (Agent hallucinated — changed hotel without warning)

[TEST 3] Multi-Agent — Valid Booking with Validation
✓ Flow: executor → validator → critic
✓ Status: COMPLETED

[TEST 4] Multi-Agent — Invalid Hotel Detection
✓ Flow: executor → validator → critic
✓ Status: FAILED
    (Correctly detected the invalid hotel)
Enter fullscreen mode Exit fullscreen mode
Scenario Single Agent Multi-Agent Swarm
Valid booking ✅ Executes correctly ✅ Executes and validated
Invalid hotel requested ❌ Silently substitutes another hotel ✅ Detected → FAILED

In TEST 2, the booking tool returned ERROR: Hotel not found. The single agent silently substituted a different hotel and reported success. In TEST 4, the Validator identified the discrepancy between what was requested and what the Executor actually booked — and returned an explicit FAILED status.


Why This Pattern Works

The Validator doesn't retry the operation. It reads the Executor's full output — including what tools were called and what they returned — and checks whether the result is consistent with the original request.

This is the key insight: hallucinations are often consistent internally but inconsistent with the request. The single agent's substitution was internally valid (the hotel exists, the booking succeeded). The problem was that it wasn't what the user asked for. Only a separate agent comparing the request against the result can catch that.

The Critic adds a second independent checkpoint, producing an explicit verdict that makes the system's confidence level visible — APPROVED or REJECTED, not just a response the user has to interpret.


Considerations

Advantages:

  • Hallucinations caught before reaching users
  • Explicit COMPLETED / FAILED status — errors surfaced, not hidden
  • Full audit trail through the agent chain
  • Each agent focused on a single responsibility — easier to debug

Challenges:

  • Higher latency — three LLM calls per request instead of one
  • Validator quality depends on its system prompt clarity
  • More complex to tune than a single-agent prompt
  • Cost increases with the number of agents in the chain

When to use it:
Prioritize multi-agent validation for operations where silent errors are costly — bookings, payments, cancellations, data writes. For low-stakes read-only queries, a single agent is likely sufficient.


What Comes Next

Multi-agent validation catches hallucinations in multi-step reasoning chains. But some violations are more fundamental: parameter limits, payment prerequisites, capacity constraints. These require rules that execute before any tool runs — outside LLM control entirely.

Part 4 covers neurosymbolic guardrails: symbolic rules enforced at the framework level via Strands Hooks, that the LLM cannot bypass regardless of how the user phrases the request.


Key Takeaways

  • Single agents have no self-correction mechanism — hallucinations go undetected by design
  • The Executor → Validator → Critic pattern introduces cross-validation at every step
  • Strands Swarm handles autonomous handoffs with shared context and explicit status tracking
  • Identical tools and model — the difference is architectural, not model-dependent
  • Status.FAILED makes errors explicit instead of returning a confident wrong answer
  • Best applied to high-stakes operations where silent substitution has real consequences

Run It Yourself

git clone https://github.com/aws-samples/sample-why-agents-fail
Enter fullscreen mode Exit fullscreen mode
cd stop-ai-agent-hallucinations/03-multiagent-demo
Enter fullscreen mode Exit fullscreen mode
uv venv && uv pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode
uv run test_multiagent_hallucinations.py
Enter fullscreen mode Exit fullscreen mode

You can swap to any provider supported by Strands — see Strands Model Providers for configuration.


References

Research

Strands Agents

Code


Gracias!

🇻🇪🇨🇱 Dev.to Linkedin GitHub Twitter Instagram Youtube

Top comments (3)

Collapse
 
klement_gunndu profile image
klement Gunndu

The Executor → Validator → Critic chain is solid, but one gap I’ve noticed with this pattern: the Validator can hallucinate its verification too. Adding a deterministic check (hash comparison, schema validation) between the Executor and Validator catches the cases where both LLMs agree on the wrong answer.

Collapse
 
elizabethfuentes12 profile image
Elizabeth Fuentes L AWS

It would be like a conspiracy :D. I don't think we should rely solely on this technique; rather, it should be combined with others, depending on the specific use case.

Collapse
 
ensamblador profile image
ensamblador

Excellent for critical tasks where hallucinating is not an option.