AI agent hallucinations occur when an LLM-powered agent fabricates data, selects the wrong tool, or ignores business rules during autonomous task execution. This post deploys 5 production-ready techniques to stop them — using managed hosting, serverless tools, database-driven guardrails, semantic tool routing, and a knowledge graph. Everything deploys as infrastructure as code.
TL;DR — 5 techniques, one production stack:
- Graph-RAG on Neo4j AuraDB eliminates fabricated aggregations with Cypher queries
- Semantic tool routing via AgentCore Gateway replaces custom FAISS indexes
- Multi-agent validation on Lambda + DynamoDB catches errors single agents miss
- Database-driven steering rules update agent behavior without redeploying
- Hard hooks + soft steers separate financial constraints from operational adjustments
Every demo in this series on stopping AI agent hallucinations ran on a laptop. Hardcoded data, in-memory rules, a single user. The techniques worked, but the infrastructure did not scale. Moving to production means replacing local files with managed databases, in-process tools with serverless functions, and hardcoded rules with data you can update without redeploying.
This post walks through a production hotel booking agent that applies every technique from the series. We use Strands Agents for its native support of tool calling, hooks for guardrail enforcement, and MCP (Model Context Protocol) integration for connecting to Amazon Bedrock AgentCore Gateway. Similar patterns can be applied in LangGraph, AutoGen, CrewAI, or other agent frameworks.
Complexity note: This guide assumes familiarity with AWS CDK and core AWS services (Lambda, DynamoDB, S3). New to CDK? Start with the CDK Workshop first.
Working code: github.com/elizabethfuentes12/why-agents-fail-sample-for-amazon-agentcore
Prerequisites
You need the following:
- AWS account with CDK bootstrapped in your target region
- AWS CLI installed and configured
- AWS CDK v2 (Cloud Development Kit) — infrastructure as code framework for deploying all resources
- Python 3.11+ and uv package manager
- OpenAI API key — the agent uses GPT-4o-mini as the LLM (Large Language Model) provider, swappable for Amazon Bedrock or other providers
- Neo4j AuraDB Free account — only if deploying GraphRAG (Stack 2)
Series Overview
This is the final post in the series on stopping AI agent hallucinations:
- RAG vs Graph-RAG: When Agents Hallucinate Answers — Knowledge graphs prevent hallucinated aggregations
- Reduce Agent Errors and Token Costs with Semantic Tool Selection — Vector filtering reduces wrong tool choices
- How to Stop AI Agents from Hallucinating Silently with Multi-Agent Validation — Cross-validation catches errors single agents miss
- AI Agent Guardrails: Rules That LLMs Cannot Bypass — Symbolic rules enforced at the framework level
- Runtime Guardrails for AI Agents — Steer, Don't Block — Agent self-corrects instead of failing
- From Demo to Production (this post) — Deploy all techniques on AWS
Each post can be read independently. The repository contains all demos with runnable code.
How Does Each Anti-Hallucination Technique Map to Production?
The table below shows what each demo built locally and what replaces it in production:
| Technique | Demo implementation | Production replacement |
|---|---|---|
| Graph-RAG (demo 01) | Local Neo4j Desktop + manual APOC scripts | Neo4j AuraDB Free + automated Lambda builder using SimpleKGPipeline |
| Semantic tool selection (demo 02) | In-memory FAISS (Facebook AI Similarity Search) index rebuilt on every run | AgentCore Gateway with MCP semantic routing — no custom index needed |
| Multi-agent validation (demo 03) | Hardcoded validator agents with in-memory state |
validate_booking_rules Lambda backed by DynamoDB — same safety, lower latency |
| Neurosymbolic guardrails (demo 04) | Python hooks with hardcoded thresholds | Steering rules in DynamoDB — change rules without redeploying |
| Agent Control steering (demo 05) | Local Agent Control server with config files | STEER messages stored in DynamoDB rules — agent self-corrects from database-driven guidance |
The anti-hallucination techniques remain the same. The infrastructure becomes managed, scalable, and updatable without code changes.
What Does the Production Architecture Look Like?
The production architecture runs on Amazon Bedrock AgentCore. Runtime hosts the agent code, and Gateway routes tool calls to AWS Lambda functions via MCP. Two independent AWS CDK stacks deploy the full infrastructure:
- AgentCore Runtime — Hosts and runs your agent code in a managed environment
- AgentCore Gateway — MCP-based semantic tool routing that connects agents to serverless functions
Stack 1 — HotelBookingAgentStack (deploy first, works on its own):
| Resource | Purpose |
|---|---|
| 3 Amazon DynamoDB tables | Hotels inventory, bookings, steering rules |
| AWS Secrets Manager | OpenAI API key stored securely |
| AgentCore Runtime | Managed agent hosting |
| AgentCore Gateway | MCP semantic routing to Lambda tools |
| 7 AWS Lambda functions | search, book, get_booking, process_payment, confirm, cancel, validate |
Stack 2 — GraphRAGStack (deploy when ready, adds FAQ capabilities):
| Resource | Purpose |
|---|---|
| Amazon S3 bucket | 300 hotel FAQ documents auto-uploaded during deploy |
| Lambda build_graph | Builds knowledge graph from documents using SimpleKGPipeline |
| Lambda query_knowledge_graph | Executes Cypher queries against Neo4j AuraDB |
| Neo4j AuraDB Free | Managed graph database ($0/month free tier) |
How Do Steering Rules Work as Data Instead of Code?
In demo 04, business rules were hardcoded in Python. Changing a threshold meant changing code and redeploying. In production, rules live in DynamoDB. Update a row and the agent's behavior changes immediately.
Each rule has two messages: a fail_message that describes the violation and a steer_message that tells the agent how to self-correct:
{
"rule_id": "max-guests",
"action": "book",
"condition_field": "guests",
"operator": "gt",
"threshold": 10,
"fail_message": "Guest count exceeds maximum of 10",
"steer_message": "Booking for 15 guests is not available, but you CAN book for up to 10. Adjust to 10 guests, proceed, and tell the user.",
"enabled": true
}
The validate_booking_rules tool reads rules from DynamoDB before every booking action. When a rule is violated, the agent receives the STEER guidance and self-corrects, completing the task instead of blocking:
from strands import tool
@tool
def validate_booking_rules(
action: str,
guests: int = 0,
check_in: str = "",
check_out: str = "",
booking_id: str = "",
) -> str:
"""Validate business rules BEFORE executing a booking action.
ALWAYS call this before book_hotel, confirm_booking, or cancel_booking.
Rules are loaded from the SteeringRules database, changeable without redeploying.
"""
rules = _get_rules_for_action(action) # DynamoDB scan
context = _build_context(action, params) # Derive nights, days_until_checkin
violated = _evaluate_rules(rules, context) # Symbolic evaluation
if not violated:
return f"PASS: All {len(rules)} rules passed for '{action}'. Proceed."
lines = []
for v in violated:
lines.append(f"- {v['fail_message']}\n STEER: {v['steer_message']}")
return f"FAIL: {len(violated)} rule(s) violated:\n" + "\n".join(lines)
Changing a rule takes one command. No redeploy, no PR, takes effect immediately:
aws dynamodb update-item \
--table-name HotelBookingAgentStack-SteeringRules \
--key '{"rule_id": {"S": "max-guests"}}' \
--update-expression "SET threshold = :t" \
--expression-attribute-values '{":t": {"N": "8"}}'
The 6 steering rules deployed by default:
| Rule | Action | What it catches | How the agent self-corrects |
|---|---|---|---|
| max-guests | book | More than 10 guests | Adjusts to 10, informs user |
| valid-dates | book | Check-out before check-in | Swaps dates, informs user |
| advance-booking | book | Same-day booking | Moves check-in to tomorrow |
| payment-before-confirm | confirm | Unpaid booking | Processes payment first |
| cancellation-window | cancel | Less than 48h to check-in | Suggests modification instead |
| already-cancelled | cancel | Booking already cancelled | Offers to create new booking |
When Should You Block vs Steer?
Not every rule should be steerable. Payment before confirmation is a financial integrity constraint: the LLM must never bypass it, regardless of how it interprets a STEER message. The production agent uses a two-layer approach:
Layer 1 — Hard hooks (framework-level, cannot be bypassed):
from strands.hooks.events import BeforeToolCallEvent
from strands.hooks.registry import HookProvider, HookRegistry
class BookingGuardrailsHook(HookProvider):
"""Hard guardrails for financial and contractual constraints.
The LLM cannot bypass these. They execute before the tool runs.
"""
def register_hooks(self, registry: HookRegistry) -> None:
registry.add_callback(BeforeToolCallEvent, self._validate)
def _validate(self, event: BeforeToolCallEvent) -> None:
tool_name = event.tool_use["name"]
params = event.tool_use.get("input", {})
if "confirm" in tool_name:
booking = self._bookings.get_item(
Key={"booking_id": params.get("booking_id", "")}
).get("Item")
if booking and booking["status"] != "PAID":
event.cancel_tool = (
"BLOCKED: Payment must be processed before confirmation. "
"Ask the user if they want to proceed with payment."
)
Layer 2 — Soft steering (DynamoDB rules, agent self-corrects):
The validate_booking_rules tool shown above. Rules can be updated, disabled, or added without touching agent code.
How to decide which layer to use:
| Layer | Mechanism | When to use | Can the LLM bypass it? |
|---|---|---|---|
| Hard hook |
event.cancel_tool blocks execution |
Financial, legal, compliance | No — framework intercepts before the tool runs |
| Soft steer | STEER message guides correction | Capacity limits, date adjustments, preferences | No bypass — but the agent adapts and completes the task |
| Prompt rule | System prompt instruction | Workflow order, communication style | Yes — the LLM may ignore under ambiguous input |
Use hard hooks for rules where failure means financial or legal risk. Use soft steering for everything else. It reduces user friction without sacrificing safety.
How Does the MCP Integration with AgentCore Gateway Work?
The agent connects to AgentCore Gateway via MCP at startup and discovers all available tools at runtime, with no hardcoded tool list needed. In demo 02, we built a custom FAISS index to pre-filter tools by semantic similarity. In production, the Gateway replaces that entirely, routing each tool call to the right Lambda function based on semantic matching:
from bedrock_agentcore import BedrockAgentCoreApp
from strands import Agent
from strands.tools.mcp.mcp_client import MCPClient
from mcp.client.streamable_http import streamablehttp_client
# Using OpenAI-compatible interface via Strands SDK (not direct OpenAI usage)
from strands.models.openai import OpenAIModel
app = BedrockAgentCoreApp()
@app.entrypoint
def invoke(payload, context=None):
"""Entry point for AgentCore Runtime invocations."""
model = OpenAIModel(
model_id="gpt-4o-mini",
client_args={"api_key": _openai_api_key},
)
hooks = [BookingGuardrailsHook()]
# Gateway handles semantic tool routing via MCP
mcp_client = MCPClient(lambda: streamablehttp_client(GATEWAY_URL))
with mcp_client:
tools = mcp_client.list_tools_sync()
agent = Agent(
model=model,
tools=tools,
system_prompt=SYSTEM_PROMPT,
hooks=hooks,
)
prompt = payload.get("prompt", "")
return str(agent(prompt))
The agent does not define tools inline. It connects to the Gateway, which discovers available Lambda functions and routes calls based on semantic matching. Adding a new tool (such as query_knowledge_graph) means registering it in the Gateway. No agent code changes.
How Does GraphRAG Work in Production?
Demo 01 demonstrates that traditional RAG (Retrieval-Augmented Generation) hallucinates when answering aggregation queries. "How many hotels have a pool?" gets a guess instead of a count. Graph-RAG eliminates this by executing Cypher queries on a knowledge graph that computes exact results.
In production, the knowledge graph runs on Neo4j AuraDB Free, a managed graph database with a $0/month free tier (200K nodes, no credit card required). The build pipeline is fully automated:
-
300 hotel FAQ documents upload to S3 during
cdk deploy - build_graph Lambda reads documents, calls OpenAI to extract entities and relationships via SimpleKGPipeline, and loads them into Neo4j AuraDB
- query_knowledge_graph Lambda executes Cypher queries and returns structured results
- The query Lambda registers in AgentCore Gateway — the agent discovers it via MCP
Questions like "What amenities does the Grand Hotel have?" now traverse the graph (following connections between nodes) instead of guessing from text chunks.
The CDK stack supports two build modes via -c graph_mode=lite|full:
Build mode (graph_mode) |
What it does | Documents | Build time |
|---|---|---|---|
| lite (default) | Uploads a subset of docs, builds graph in a single Lambda | 30 | ~15 min |
| full | Uploads all docs, uses Step Functions to batch-process | 300 | ~1-2 hours |
cdk deploy GraphRAGStack # lite mode (30 docs)
cdk deploy GraphRAGStack -c graph_mode=full # full mode (300 docs)
The booking agent works without GraphRAG. Deploy Stack 2 when hotel FAQ questions are frequent.
How to Deploy
Step 1 — Clone and install
git clone https://github.com/elizabethfuentes12/why-agents-fail-sample-for-amazon-agentcore
cd why-agents-fail-sample-for-amazon-agentcore/06-agentcore-production-demo
uv venv && uv pip install -r requirements.txt
Step 2 — Build and deploy
./create_deployment_package.sh
cd cdk
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cdk bootstrap # Only needed once per AWS account/region
cdk deploy HotelBookingAgentStack
Step 3 — Store your OpenAI API key
- Open AWS Secrets Manager Console
- Find
/HotelBookingAgentStack/openai-api-key - Set the secret value to your OpenAI API key
Step 4 — Seed hotel data
cd ..
AWS_DEFAULT_REGION=us-east-1 uv run python seed_data.py
Step 5 — Test
Open test_agent_local.ipynb and run all cells. The notebook tests every anti-hallucination layer — see the Try It Yourself section for the full scenario list.
Adding GraphRAG (optional, separate stack)
# 1. Create a free Neo4j AuraDB instance at neo4j.com/cloud/aura-free
# 2. Deploy the GraphRAG stack
cd cdk && INCLUDE_GRAPHRAG=1 cdk deploy GraphRAGStack
# 3. Store Neo4j credentials in Secrets Manager (4 secrets)
# 4. Build the knowledge graph
aws lambda invoke --function-name graphrag-build-graph \
--region us-east-1 --cli-read-timeout 900 \
/tmp/build-graph-output.json
# 5. Connect GraphRAG to the booking agent
cdk deploy HotelBookingAgentStack \
-c graphrag_query_lambda_arn=<QueryLambdaArn from GraphRAGStack output>
Try It Yourself
The repository includes test_agent_local.ipynb — a notebook that tests every anti-hallucination layer against the deployed agent:
| # | Scenario | Technique (from series) | What the agent does |
|---|---|---|---|
| 1 | Full booking flow | All layers | validate → book → pay → validate → confirm |
| 2 | 15 guests (max 10) | Soft steering (demo 05) | Self-corrects to 10, informs user |
| 3 | Confirm without payment | Hard hook (demo 04) | Blocks confirmation, asks to pay first |
| 4 | Sold-out hotel | Grounded retrieval (demo 01) | Returns "no rooms available" from DynamoDB |
| 5 | City with no hotels | Grounded retrieval | Returns "no hotels found", no fabrication |
| 6 | Non-existent hotel | Grounded retrieval | Returns error, does not invent a hotel |
| 7 | Hotel amenities query | Graph-RAG (demo 01) | Cypher query returns real data from Neo4j |
cd 06-agentcore-production-demo
uv venv && uv pip install -r requirements.txt
jupyter notebook test_agent_local.ipynb
Run all cells top to bottom. The notebook reads the AgentRuntimeArn from CloudFormation outputs automatically.
Built-in Observability
AgentCore provides built-in observability when you include the OpenTelemetry dependencies in your agent package. Add strands-agents[otel] and aws-opentelemetry-distro to your agent's requirements.txt:
strands-agents[openai,otel]>=1.27.0
aws-opentelemetry-distro>=0.7.0
With these dependencies, AgentCore automatically instruments Strands Agents — capturing invocation logs, tool call traces (which Lambda was called, input/output, latency), and error tracking (failed tool calls, guardrail blocks). You can monitor how the agent handles each anti-hallucination layer (steering, hooks, grounded retrieval) in Amazon CloudWatch without adding custom logging.
See the observability getting started guide for details.
5 Lessons from Moving Anti-Hallucination Techniques to Production
1. Separate hard blocks from soft steers. Payment-before-confirmation is a hard block: the agent cannot work around it. Guest count limits are a soft steer where the agent adjusts and completes the task. Mixing them in the same mechanism (all hooks or all prompts) either blocks too much or steers too little.
2. Store rules in a database, not in code. Business rules change more frequently than agent code. When the maximum guest count changes from 10 to 8, an operations team should update a DynamoDB item, not open a pull request and wait for CI/CD.
3. Deploy stacks independently. The booking agent works without GraphRAG. Independent stacks mean you can ship the core agent quickly and add capabilities when the use case demands it.
4. Test locally against the same data. The test_agent_local.ipynb notebook calls the same DynamoDB tables as production. Run all 7 test scenarios locally before deploying to AgentCore. The behavior is identical.
5. Keep tools as pure CRUD (Create, Read, Update, Delete). Lambda functions do one thing: read or write data. All business rule enforcement happens in the guardrail layers (hooks and validate tool). This keeps tools reusable and guardrails centralized.
Key Takeaways
- Hard hooks block what the LLM must never bypass (financial, legal, compliance constraints)
- Soft steering corrects what the agent can adapt (capacity, dates, preferences) — reducing user friction
- DynamoDB steering rules let operations teams change agent behavior without code deploys
- AgentCore Gateway replaces custom tool-selection indexes with managed MCP-based semantic routing
- Graph-RAG on Neo4j AuraDB eliminates fabricated aggregations by computing exact results with Cypher queries
Clean Up
To avoid ongoing charges:
cd cdk
cdk destroy HotelBookingAgentStack
INCLUDE_GRAPHRAG=1 cdk destroy GraphRAGStack
Frequently Asked Questions
What is Amazon Bedrock AgentCore?
Amazon Bedrock AgentCore is a managed service for deploying and running AI agents. It provides two components: Runtime (hosts agent code) and Gateway (MCP-based tool routing that connects agents to serverless functions). AgentCore handles scaling, networking, and credential management so you do not need to build that infrastructure yourself.
Can I use Amazon Bedrock models instead of OpenAI?
Yes. The agent supports any provider compatible with Strands Agents, including Amazon Bedrock, Anthropic, or Ollama. Change the model in booking_agent.py. The anti-hallucination techniques work independently of the LLM provider.
What is the difference between hard hooks and soft steering?
Hard hooks use event.cancel_tool to block tool execution at the framework level. The LLM cannot bypass them. Soft steering returns a STEER message that instructs the agent how to self-correct: the agent adjusts parameters and completes the task. Use hard hooks for financial and compliance rules. Use soft steering for operational rules where the agent can adapt.
How much does this architecture cost?
All AWS services used (DynamoDB on-demand, Lambda, Secrets Manager) are eligible for the AWS Free Tier. Neo4j AuraDB Free has a $0/month tier. For current pricing details, see the AWS Pricing page. For a demo or low-traffic deployment, total infrastructure cost is minimal, excluding LLM API costs.
Do I need GraphRAG for the agent to work?
No. Stack 1 (HotelBookingAgentStack) deploys a fully functional booking agent with search, booking, payment, and validation tools. GraphRAG (Stack 2) adds hotel FAQ capabilities. Deploy it when you need answers about amenities, policies, and services.
Can I apply these patterns with other agent frameworks?
Yes. The guardrail patterns (database-driven rules, hard hooks, and soft steering) are framework-agnostic. Strands provides native hook and MCP support, but LangGraph, AutoGen, and CrewAI offer similar extension points for tool interception and pre-validation.
How does the agent discover tools via MCP at runtime?
The agent connects to AgentCore Gateway using MCPClient from Strands. On startup, it calls list_tools_sync() to discover all available Lambda tools registered in the Gateway. When the agent needs a tool, the Gateway routes the call based on semantic matching. There is no hardcoded tool list in the agent code. Adding a new tool means registering it in the Gateway; the agent discovers it automatically on the next invocation.
Research Background
The techniques in this series are grounded in recent research on AI agent reliability:
- RAG-KG-IL: Multi-Agent Hybrid Framework for Reducing Hallucinations — Knowledge graphs reduce hallucinations compared to standalone LLMs
- Internal Representations as Indicators of Hallucinations in Agent Tool Selection — Tool selection errors increase with tool count; semantic routing mitigates this
- Teaming LLMs to Detect and Mitigate Hallucinations — Multi-agent validation detects errors that single agents miss
- MetaRAG: Metamorphic Testing for Hallucination Detection — Hallucinations are inherent to LLMs without structured grounding
The complete code — CDK stacks, Lambda tools, agent runtime, steering rules, test notebook, and 300 hotel FAQ documents — is in the repository.
Gracias!

Top comments (2)
A surprising insight is that implementing simple feedback loops can drastically reduce AI agent hallucinations. In our experience with enterprise teams, using real-time user feedback to adjust agent responses in production can be a game-changer. Start by integrating a dynamic scoring system that weights user corrections to guide your agent's learning process. This approach helps the agent align more closely with user expectations over time. - Ali Muwwakkil (ali-muwwakkil on LinkedIn)
Hi,
I’m now looking for a reliable long-term partner.
You’ll use your profile to communicate with clients, while I handle all technical work in the background.
We’ll position ourselves as an individual freelancer to attract more clients, especially in the US market where demand is high.
Best regards,