When you ask an AI assistant like Kiro (AWS's AI coding assistant), Claude Code, or ChatGPT to "build me an agent," you get working code. But you don't see the architecture decisions happening behind the scenes. The agent responds to queries, but it might waste tokens in reasoning loops, hallucinate answers from incomplete data, or freeze on slow APIs. These failures are silent until production.
When you prompt AI coding assistants to build agents, they make architecture decisions silently—choosing retrieval strategies, validation approaches, and error handling patterns. These 8 patterns give you the vocabulary to specify production-grade decisions in your prompts, preventing hallucinations and token waste before code is generated.
This post closes two series I wrote documenting the most expensive agent failures in production: Stop AI Agent Hallucinations (5 techniques) and Why AI Agents Fail (3 failure modes). If you know these 8 patterns, you can guide AI assistants to avoid them from the start.
This isn't a step-by-step implementation guide. It's a reference for knowing what exists so you can recognize when to use each pattern based on your use case.
Working code for all 8 techniques: Linked in each section
Why This Matters
AI coding assistants generate agent code in seconds. Kiro, Claude Code, Cursor, and ChatGPT can scaffold tools, configure LLM calls, and wire up retrieval systems faster than manual coding.
But speed creates a problem: you get working code without seeing the tradeoffs.
When you prompt "build a booking agent with RAG," the assistant makes decisions:
- Which retrieval strategy? (vector similarity, graph queries, hybrid)
- How to handle large outputs? (truncate, summarize, external storage)
- What validation runs before tool execution? (none, prompts, framework hooks)
- How to handle slow APIs? (block, timeout, async patterns)
Your prompt doesn't specify these. The assistant picks defaults. Those defaults create the failure modes this post documents.
The 8 Failure Patterns
Hallucination Failures (5 patterns):
- GraphRAG - Vector RAG fabricates statistics from incomplete chunks
- Semantic Tool Selection - Too many tools, agent picks wrong ones
- Neurosymbolic Guardrails - Agent ignores business rules in prompts
- Runtime Guardrails (Steering) - Agent violates rules, needs correction not blocking
- Multi-Agent Validation - Single agent claims success when operations fail
Silent Token Waste (3 patterns):
- Memory Pointer Pattern - Large data overflows context, causes truncation
- Async HandleId Pattern - Slow APIs block agent indefinitely
- DebounceHook + Explicit States - Agent loops same tool call without progress
You don't implement all 8. You learn what they solve, then specify the ones your use case needs when prompting.
What Are These 8 Patterns?
These patterns solve the most expensive production failures: hallucinations from incomplete data (GraphRAG, Semantic Tool Selection, Guardrails, Steering, Multi-Agent), and silent token waste (Memory Pointers, Async HandleId, DebounceHook). You learn what each solves, then specify the ones your use case needs when prompting AI assistants. This prevents debugging black-box code in production.
Measured Impact from Production
| Pattern | Result | Source |
|---|---|---|
| GraphRAG | Exact counts vs fabricated approximations | RAG vs GraphRAG |
| Semantic Tool Selection | 86.4% fewer errors, 89% lower token costs | Tool Selection |
| Memory Pointers | 20M tokens reduced to 1,234 tokens | IBM Materials Science study |
| Async HandleId | 18-second block eliminated, no 424 timeouts | MCP Timeouts |
| Explicit States | 14 calls reduced to 2 (7x improvement) | Reasoning Loops |
Pattern 1: GraphRAG for Precise Queries
What Is GraphRAG?
GraphRAG replaces vector similarity with graph database queries for structured data. When your agent needs exact counts, aggregations, or relationship traversal, GraphRAG translates natural language to Cypher queries that return precise results from structured data instead of hallucinated statistics from incomplete text chunks. Use it for structured queries, keep vector RAG for semantic search.
What Breaks
Vector RAG fabricates statistics. Ask "How many hotels in Miami have pools and breakfast?" and vector similarity retrieves 3 text chunks mentioning Miami, pools and breakfast. The LLM sees incomplete data, calculates from samples, and returns "approximately 120 hotels" (fabricated from 3 chunks out of 200 hotels).
Out-of-domain queries return hallucinated answers instead of admitting no data exists.
The Fix
Replace vector retrieval with graph queries for structured data. Store hotels, amenities, and relationships in Neo4j. The LLM translates "hotels with pools and breakfast" into Cypher:
MATCH (h:Hotel)-[:HAS_AMENITY]->(a:Amenity)
WHERE a.name IN ['pool', 'breakfast']
RETURN count(DISTINCT h)
Result: 133 hotels (exact count from database).
Out-of-domain query: "No hotels found in Antarctica" instead of fabricating results.
What to Tell Your AI Assistant
"Build a travel agent using GraphRAG. For structured
queries (hotels, amenities, availability), translate to Cypher
and execute against the graph. Only use vector RAG for unstructured descriptions. Return exact counts from graph traversal."
When to Use
- Structured data with relationships (products, inventory, locations)
- Queries requiring counts, aggregations, or multi-hop traversal
- Domains where fabricating statistics creates legal/financial risk
Full details: RAG vs GraphRAG: When Agents Hallucinate Answers
Learn more: Neo4j Cypher Documentation
Pattern 2: Semantic Tool Selection
What Is Semantic Tool Selection?
Semantic tool selection uses vector embeddings to filter tools before the LLM sees them. When your agent has 10+ tools, sending all descriptions on every call increases error rates (agent picks wrong tools) and token costs (paying for unused descriptions). Semantic filtering embeds tool descriptions offline, then at runtime matches the query to top-5 relevant tools, reducing errors by 86.4% and costs by 89%.
What Breaks
With 50 tools, two failures occur: (1) agent picks wrong tools because descriptions overlap, and (2) token costs explode from sending all 50 tool descriptions on every LLM call.
Measured impact: Error rates increase with tool count, token costs scale linearly.
The Fix
Use vector embeddings to filter tools before the LLM sees them. Embed tool descriptions offline. At runtime, embed the user query, compute similarity, pass only top-5 relevant tools to the agent.
Results from production:
- Errors reduced: 86.4%
- Token costs reduced: 89%
- Latency: <10ms for tool filtering
What to Tell Your AI Assistant
"Build a multi-tool agent with semantic tool selection At
runtime, embed the query, retrieve top-5 similar tools, pass only
those to the agent. Keep conversation memory, dynamically swap tools."
When to Use
- Agents with 10+ tools
- Tools with overlapping descriptions
- Cost-sensitive applications
Full details: Reduce Agent Errors and Token Costs with Semantic Tool Selection
Pattern 3: Neurosymbolic Guardrails (Block)
What Are Neurosymbolic Guardrails?
Neurosymbolic guardrails enforce business rules at the framework level, below the LLM's control. When prompts alone cannot enforce constraints (max guests, valid dates, budget limits), guardrails use pre-execution hooks to validate parameters and cancel invalid operations. Rules live in code, not prompts, so the LLM cannot bypass them. Use blocking guardrails for hard constraints that cannot be violated.
What Breaks
Prompts cannot enforce business rules. Even with clear docstrings ("max_guests must be ≤10"), the LLM passes max_guests=15 under pressure because prompts are suggestions, not constraints. The agent violates rules silently.
The Fix
Use framework hooks to validate parameters before tool execution. If validation fails, cancel the tool call and return corrective guidance. Rules live in code at the framework level, below the LLM's control.
Measured impact: Zero violations in 100-query test (vs. 12 violations with prompts alone).
What to Tell Your AI Assistant
"Build a booking agent with guardrails using Strands Agents hooks.
Create a BeforeToolCallEvent hook that validates:
- max_guests ≤ 10
- check_in_date > today
- budget > 0
If validation fails, cancel the tool call with event.cancel_tool()
and return error message. Do not rely on prompts for validation."
When to Use
- Business rules that cannot be violated (compliance, legal, financial)
- Validation requiring computation (date math, inventory checks)
- Rules that change frequently
Full details: AI Agent Guardrails: Rules That LLMs Cannot Bypass
Pattern 4: Runtime Guardrails (Steer, Don't Block)
What Is Steering vs Blocking?
Steering guardrails return corrective guidance instead of blocking operations. When the agent violates a soft rule (format issues, parameter adjustments, data redaction), steering returns instructions via Guide() so the agent self-corrects and retries. This differs from blocking guardrails (Pattern 3) which stop workflows entirely. Use steering for rules where the agent can fix itself, blocking for hard constraints.
What Breaks
Hard guardrails (Pattern 3) block operations and stop workflows. For soft rules where the agent can self-correct (format issues, parameter adjustments, redacting sensitive data), blocking creates friction. The agent could fix the problem itself if given guidance.
The Fix
Use Agent Control to return corrective guidance via Guide() instead of blocking. When the agent violates a soft rule, the control plane returns instructions: "Adjust parameter X to Y and retry." The agent self-corrects and completes the task without human intervention.
Difference from Pattern 3:
- Block (Pattern 3): Hard constraints, workflow stops
- Steer (Pattern 4): Soft rules, agent self-corrects
What to Tell Your AI Assistant
"Build a booking agent with Agent Control for soft rules. Connect
to Agent Control server. For soft rules (parameter formatting,
date adjustments, data redaction), return Guide() with correction
instructions instead of blocking. Agent should retry with fix applied.
Use hard blocks (Pattern 3) only for compliance rules that cannot
be violated under any circumstance."
When to Use
- Rules where agent can self-correct (format, adjust parameters)
- Workflows where blocking creates poor UX
- Rules managed centrally via API/dashboard (update without redeploying)
Full details: Runtime Guardrails for AI Agents: Steer, Don't Block
Pattern 5: Multi-Agent Validation
What Is Multi-Agent Validation?
Multi-agent validation deploys specialized agents with different roles (Executor, Validator, Critic) that cross-check each other's work. Single agents optimize for appearing successful, not verifying outcomes. Multiple agents with different optimization functions catch errors the others miss. Executor performs tasks, Validator cross-checks against ground truth, Critic provides final review before returning to the user.
What Breaks
Single agents cannot self-validate. When an agent books a hotel, it claims "Success: Booked Grand Plaza Hotel" even if the API returned an error or the hotel doesn't exist in the database. The agent optimizes for appearing successful, not verifying outcomes.
The Fix
Deploy multiple agents with different roles: Executor performs tasks, Validator cross-checks against ground truth, Critic provides final review. Agents share context and hand off control autonomously when their role completes.
Measured impact: Multi-agent catches errors single agent misses (e.g., booking non-existent hotels).
What to Tell Your AI Assistant
"Build a multi-agent system using Strands Swarm with 3 agents:
1. Executor: Books hotels, searches flights
2. Validator: Cross-checks operations against database
3. Critic: Final review before returning to user
Agents share context via swarm.context. Use autonomous handoffs.
Agents decide when to hand off based on task completion."
When to Use
- High-stakes operations (financial, medical, legal)
- Tasks where "appears successful" differs from "actually successful"
- Complex workflows with multiple verification points
Full details: How to Stop AI Agents from Hallucinating Silently with Multi-Agent Validation
Pattern 6: Memory Pointer Pattern
What Is the Memory Pointer Pattern?
The Memory Pointer Pattern stores large data outside the LLM context and passes short references instead. When tools return 200KB+ logs or 1000-row database results, passing them directly causes silent truncation. Memory pointers store data in agent.state, return a pointer to the LLM, and provide separate tools that resolve pointers to access full data. IBM reduced 20M tokens to 1,234 tokens using this pattern.
What Breaks
Context window overflow occurs when tools return more data than the LLM can process (200KB+ logs, 1000-row database results). The agent doesn't crash. It silently truncates data, loses context, produces incomplete answers.
Real production case (IBM Materials Science):
- Before: 20 million tokens, workflow failed
- After: 1,234 tokens, workflow succeeded
The Fix
Store large data in agent.state, pass short references to the LLM. Tools return pointers like "logs-app-server". Subsequent tools resolve pointers to access full data. LLM only sees: "Data stored as logs-app-server. Use analyze_errors(pointer)."
Data in context reduced: 214KB → 52 bytes
What to Tell Your AI Assistant
"Build a log analysis agent using Memory Pointer Pattern. When
fetch_logs returns >20KB:
1. Store in agent.state with unique pointer ID
2. Return to LLM: 'Data stored as logs-{app}. Use analyze_logs(pointer).'
3. Implement analyze_logs(pointer) that resolves from agent.state
Never pass large data directly to LLM context."
When to Use
- Tools returning large outputs (logs, database queries, files)
- Workflows with multiple processing steps on same large data
- Cost-sensitive applications
Full details: AI Context Window Overflow: Memory Pointer Fix
Pattern 7: Async HandleId Pattern
What Is the Async HandleId Pattern?
The async handleId pattern prevents slow external APIs from blocking your agent. When an API takes 30+ seconds, synchronous calls freeze the entire agent. Async handleId returns a job ID immediately, letting the agent continue with other tasks. A separate check_status tool polls for results when ready. This eliminates 424 timeout errors and keeps agents responsive.
What Breaks
External APIs that take 30+ seconds block the agent indefinitely. No other tools can run. After ~7 seconds, many implementations return 424 timeout errors, freezing the workflow.
The Fix
Tools return immediately with a job ID instead of waiting. Agent stores handleId and continues. Separate check_status(job_id) tool polls for results asynchronously.
Measured impact:
- Before: 18-second API blocks agent, 424 timeout
- After: Tool returns <1 second, agent polls when ready
What to Tell Your AI Assistant
"Build an agent with async handleId pattern for slow APIs:
1. start_analysis(data): Submit job, return job_id immediately
2. check_status(job_id): Poll for results
Agent calls start_analysis, stores job_id, continues with other
tasks, calls check_status when ready. Do not implement blocking calls."
When to Use
- External APIs with >5 second response times
- Batch processing (video analysis, large transforms)
- Any system outside your control
Full details: Fix MCP Timeouts: Async HandleId Pattern
Pattern 8: DebounceHook + Explicit States
What Prevents Reasoning Loops?
Reasoning loops occur when ambiguous tool feedback ("more may be available") signals that retrying might help. Two fixes work together: explicit terminal states (return SUCCESS/FAILED so the LLM knows when to stop) and DebounceHook (framework hook that blocks duplicate calls). Production tests showed explicit states reduced calls from 14 to 2, while DebounceHook provides a safety net for edge cases.
What Breaks
Agents loop calling the same tool repeatedly without progress. Ambiguous feedback like "Found 3 results. More may be available" signals that retrying might help. The agent loops indefinitely.
Real production case: 847 reasoning steps at $47/minute, no answer delivered.
The Fix (Two Parts)
Part A: Explicit Terminal States
Return clear SUCCESS or FAILED states. Change "More may be available" to "SUCCESS: Found all 3 matching flights."
Part B: DebounceHook Safety Net
Framework hook tracks recent tool calls. When same (tool_name, input) appears twice, block third attempt.
Measured impact (travel booking demo):
- Ambiguous feedback: 14 calls
- Explicit SUCCESS: 2 calls (7x reduction)
- DebounceHook: 12 calls (2 blocked)
What to Tell Your AI Assistant
"Build a travel agent with anti-loop protection:
1. All tools return explicit states:
- SUCCESS: [clear completion]
- FAILED: [clear error]
Never return 'more may be available'
2. Implement DebounceHook:
- Track last 3 tool calls as (tool_name, input)
- If same pair appears twice, block third attempt
- Return 'BLOCKED: Duplicate detected'
This prevents loops without manual retry limits."
When to Use
- Agents prone to retry loops (search, API aggregators)
- Cost-sensitive applications where unbounded retries are expensive
- Production systems where infinite loops create availability risk
Full details: How to Prevent AI Agent Reasoning Loops from Wasting Tokens
Example: Generic vs Informed Prompting
❌ Generic Prompt
"Build a customer support agent that searches our knowledge base
and books appointments"
What you get:
- Vector RAG (may hallucinate on structured queries)
- Synchronous booking API (may timeout)
- No validation (can book invalid times)
- Single agent (claims success even when booking fails)
Result: Works in demo, fails in production.
✅ Informed Prompt
"Build a customer support agent:
Knowledge Base:
- Use Neo4j GraphRAG for structured queries (pricing, features)
- Use vector RAG only for semantic search (descriptions)
Booking:
- Validate appointment_time > now() before booking
- Use async handleId for booking API (10+ seconds)
- Return explicit states: SUCCESS / FAILED
Validation:
- Multi-agent: Executor (search/book), Validator (cross-check),
Critic (final review)
- Use Strands Swarm for autonomous handoffs
Loop Prevention:
- DebounceHook blocks duplicate calls
- All tools return terminal states"
What you get:
- GraphRAG prevents hallucinations
- Async prevents timeouts
- Guardrails prevent invalid bookings
- Multi-agent catches false successes
- DebounceHook prevents loops
Result: Production-ready agent.
Common Mistakes
Mistake 1: Assuming Defaults Are Best Practices
Problem: "Build a production agent" assumes the assistant knows what production means.
Fix: Specify patterns: "Use GraphRAG, guardrails, async patterns."
Mistake 2: Relying Only on Prompts for Validation
Problem: "Make sure max_guests < 10" in system prompt gets ignored under pressure.
Fix: "Implement BeforeToolCallEvent hook that validates and cancels invalid calls."
Mistake 3: Not Recognizing When Patterns Apply
Problem: Agent works in demo, breaks on edge cases.
Fix: Know the 8 patterns. When you see hallucinations, timeouts, or loops, you'll recognize which pattern solves it.
My Thoughts
AI coding assistants will keep improving at generating working code. But working code and production-ready architecture remain different targets.
The gap isn't the assistant's capability. It's the prompt's specificity.
Next Steps
If You're Building a New Agent
- Identify which patterns apply (use symptom checklist)
- Specify patterns in your prompt
- Verify generated code implements them
- Test failure modes (timeouts, invalid inputs, non-existent data)
If You're Debugging an Existing Agent
- Identify the symptom (hallucinations, loops, timeouts, rule violations)
- Map symptom to pattern (see Step 1: Recognize the Symptom)
- Prompt your assistant to add the pattern: "Add DebounceHook to prevent loops"
- Verify fix with targeted tests
Learn More (Full Implementation Guides)
Each pattern has a complete guide with working code:
- GraphRAG: RAG vs GraphRAG: When Agents Hallucinate Answers
- Semantic Tool Selection: Reduce Agent Errors and Token Costs
- Neurosymbolic Guardrails: AI Agent Guardrails: Rules That LLMs Cannot Bypass
- Runtime Guardrails (Steering): Runtime Guardrails for AI Agents: Steer, Don't Block
- Multi-Agent Validation: Stop AI Agents from Hallucinating Silently
- Memory Pointers: AI Context Window Overflow: Memory Pointer Fix
- Async HandleId: Fix MCP Timeouts: Async HandleId Pattern
- DebounceHook: Prevent AI Agent Reasoning Loops
Complete series:
- Stop AI Agent Hallucinations: 5 Essential Techniques
- Why AI Agents Fail: 3 Failure Modes That Cost You Tokens and Time
Gracias!



Top comments (0)