Elizabeth Fuentes L for AWS

Posted on May 11 • Edited on May 13 • Originally published at builder.aws.com

Prompt AI Coding Assistants to Build Production-Ready Agents: 8 Essential Patterns

#programming #ai #tutorial #python

When you ask an AI assistant like Kiro (AWS's AI coding assistant), Claude Code, or ChatGPT to "build me an agent," you get working code. But you don't see the architecture decisions happening behind the scenes. The agent responds to queries, but it might waste tokens in reasoning loops, hallucinate answers from incomplete data, or freeze on slow APIs. These failures are silent until production.

When you prompt AI coding assistants to build agents, they make architecture decisions silently—choosing retrieval strategies, validation approaches, and error handling patterns. These 8 patterns give you the vocabulary to specify production-grade decisions in your prompts, preventing hallucinations and token waste before code is generated.

This post closes two series I wrote documenting the most expensive agent failures in production: Stop AI Agent Hallucinations (5 techniques) and Why AI Agents Fail (3 failure modes). If you know these 8 patterns, you can guide AI assistants to avoid them from the start.

This isn't a step-by-step implementation guide. It's a reference for knowing what exists so you can recognize when to use each pattern based on your use case.

Working code for all 8 techniques: Linked in each section

Why This Matters

AI coding assistants generate agent code in seconds. Kiro, Claude Code, Cursor, and ChatGPT can scaffold tools, configure LLM calls, and wire up retrieval systems faster than manual coding.

But speed creates a problem: you get working code without seeing the tradeoffs.

When you prompt "build a booking agent with RAG," the assistant makes decisions:

Which retrieval strategy? (vector similarity, graph queries, hybrid)
How to handle large outputs? (truncate, summarize, external storage)
What validation runs before tool execution? (none, prompts, framework hooks)
How to handle slow APIs? (block, timeout, async patterns)

Your prompt doesn't specify these. The assistant picks defaults. Those defaults create the failure modes this post documents.

The 8 Failure Patterns

Hallucination Failures (5 patterns):

GraphRAG - Vector RAG fabricates statistics from incomplete chunks
Semantic Tool Selection - Too many tools, agent picks wrong ones
Neurosymbolic Guardrails - Agent ignores business rules in prompts
Runtime Guardrails (Steering) - Agent violates rules, needs correction not blocking
Multi-Agent Validation - Single agent claims success when operations fail

Silent Token Waste (3 patterns):

Memory Pointer Pattern - Large data overflows context, causes truncation
Async HandleId Pattern - Slow APIs block agent indefinitely
DebounceHook + Explicit States - Agent loops same tool call without progress

You don't implement all 8. You learn what they solve, then specify the ones your use case needs when prompting.

What Are These 8 Patterns?

These patterns solve the most expensive production failures: hallucinations from incomplete data (GraphRAG, Semantic Tool Selection, Guardrails, Steering, Multi-Agent), and silent token waste (Memory Pointers, Async HandleId, DebounceHook). You learn what each solves, then specify the ones your use case needs when prompting AI assistants. This prevents debugging black-box code in production.

Measured Impact from Production

Pattern	Result	Source
GraphRAG	Exact counts vs fabricated approximations	RAG vs GraphRAG
Semantic Tool Selection	86.4% fewer errors, 89% lower token costs	Tool Selection
Memory Pointers	20M tokens reduced to 1,234 tokens	IBM Materials Science study
Async HandleId	18-second block eliminated, no 424 timeouts	MCP Timeouts
Explicit States	14 calls reduced to 2 (7x improvement)	Reasoning Loops

Pattern 1: GraphRAG for Precise Queries

What Is GraphRAG?

GraphRAG replaces vector similarity with graph database queries for structured data. When your agent needs exact counts, aggregations, or relationship traversal, GraphRAG translates natural language to Cypher queries that return precise results from structured data instead of hallucinated statistics from incomplete text chunks. Use it for structured queries, keep vector RAG for semantic search.

What Breaks

Vector RAG fabricates statistics. Ask "How many hotels in Miami have pools and breakfast?" and vector similarity retrieves 3 text chunks mentioning Miami, pools and breakfast. The LLM sees incomplete data, calculates from samples, and returns "approximately 120 hotels" (fabricated from 3 chunks out of 200 hotels).

Out-of-domain queries return hallucinated answers instead of admitting no data exists.

The Fix

Replace vector retrieval with graph queries for structured data. Store hotels, amenities, and relationships in Neo4j. The LLM translates "hotels with pools and breakfast" into Cypher:

MATCH (h:Hotel)-[:HAS_AMENITY]->(a:Amenity)
WHERE a.name IN ['pool', 'breakfast']
RETURN count(DISTINCT h)

Result: 133 hotels (exact count from database).

Out-of-domain query: "No hotels found in Antarctica" instead of fabricating results.

What to Tell Your AI Assistant

"Build a travel agent using GraphRAG. For structured 
queries (hotels, amenities, availability), translate to Cypher 
and execute against the graph. Only use vector RAG for unstructured descriptions. Return exact counts from graph traversal."

When to Use

Structured data with relationships (products, inventory, locations)
Queries requiring counts, aggregations, or multi-hop traversal
Domains where fabricating statistics creates legal/financial risk

Full details: RAG vs GraphRAG: When Agents Hallucinate Answers

Learn more: Neo4j Cypher Documentation

Pattern 2: Semantic Tool Selection

What Is Semantic Tool Selection?

Semantic tool selection uses vector embeddings to filter tools before the LLM sees them. When your agent has 10+ tools, sending all descriptions on every call increases error rates (agent picks wrong tools) and token costs (paying for unused descriptions). Semantic filtering embeds tool descriptions offline, then at runtime matches the query to top-5 relevant tools, reducing errors by 86.4% and costs by 89%.

What Breaks

With 50 tools, two failures occur: (1) agent picks wrong tools because descriptions overlap, and (2) token costs explode from sending all 50 tool descriptions on every LLM call.

Measured impact: Error rates increase with tool count, token costs scale linearly.

The Fix

Use vector embeddings to filter tools before the LLM sees them. Embed tool descriptions offline. At runtime, embed the user query, compute similarity, pass only top-5 relevant tools to the agent.

Results from production:

Errors reduced: 86.4%
Token costs reduced: 89%
Latency: <10ms for tool filtering

What to Tell Your AI Assistant

"Build a multi-tool agent with semantic tool selection At 
runtime, embed the query, retrieve top-5 similar tools, pass only 
those to the agent. Keep conversation memory, dynamically swap tools."

When to Use

Agents with 10+ tools
Tools with overlapping descriptions
Cost-sensitive applications

Full details: Reduce Agent Errors and Token Costs with Semantic Tool Selection

Pattern 3: Neurosymbolic Guardrails (Block)

What Are Neurosymbolic Guardrails?

Neurosymbolic guardrails enforce business rules at the framework level, below the LLM's control. When prompts alone cannot enforce constraints (max guests, valid dates, budget limits), guardrails use pre-execution hooks to validate parameters and cancel invalid operations. Rules live in code, not prompts, so the LLM cannot bypass them. Use blocking guardrails for hard constraints that cannot be violated.

What Breaks

Prompts cannot enforce business rules. Even with clear docstrings ("max_guests must be ≤10"), the LLM passes max_guests=15 under pressure because prompts are suggestions, not constraints. The agent violates rules silently.

The Fix

Use framework hooks to validate parameters before tool execution. If validation fails, cancel the tool call and return corrective guidance. Rules live in code at the framework level, below the LLM's control.

Measured impact: Zero violations in 100-query test (vs. 12 violations with prompts alone).

What to Tell Your AI Assistant

"Build a booking agent with guardrails using Strands Agents hooks. 
Create a BeforeToolCallEvent hook that validates:
- max_guests ≤ 10
- check_in_date > today
- budget > 0

If validation fails, cancel the tool call with event.cancel_tool() 
and return error message. Do not rely on prompts for validation."

When to Use

Business rules that cannot be violated (compliance, legal, financial)
Validation requiring computation (date math, inventory checks)
Rules that change frequently

Full details: AI Agent Guardrails: Rules That LLMs Cannot Bypass

Pattern 4: Runtime Guardrails (Steer, Don't Block)

What Is Steering vs Blocking?

Steering guardrails return corrective guidance instead of blocking operations. When the agent violates a soft rule (format issues, parameter adjustments, data redaction), steering returns instructions via Guide() so the agent self-corrects and retries. This differs from blocking guardrails (Pattern 3) which stop workflows entirely. Use steering for rules where the agent can fix itself, blocking for hard constraints.

What Breaks

Hard guardrails (Pattern 3) block operations and stop workflows. For soft rules where the agent can self-correct (format issues, parameter adjustments, redacting sensitive data), blocking creates friction. The agent could fix the problem itself if given guidance.

The Fix

Use Agent Control to return corrective guidance via Guide() instead of blocking. When the agent violates a soft rule, the control plane returns instructions: "Adjust parameter X to Y and retry." The agent self-corrects and completes the task without human intervention.

Difference from Pattern 3:

Block (Pattern 3): Hard constraints, workflow stops
Steer (Pattern 4): Soft rules, agent self-corrects

What to Tell Your AI Assistant

"Build a booking agent with Agent Control for soft rules. Connect 
to Agent Control server. For soft rules (parameter formatting, 
date adjustments, data redaction), return Guide() with correction 
instructions instead of blocking. Agent should retry with fix applied.

Use hard blocks (Pattern 3) only for compliance rules that cannot 
be violated under any circumstance."

When to Use

Rules where agent can self-correct (format, adjust parameters)
Workflows where blocking creates poor UX
Rules managed centrally via API/dashboard (update without redeploying)

Full details: Runtime Guardrails for AI Agents: Steer, Don't Block

Pattern 5: Multi-Agent Validation

What Is Multi-Agent Validation?

Multi-agent validation deploys specialized agents with different roles (Executor, Validator, Critic) that cross-check each other's work. Single agents optimize for appearing successful, not verifying outcomes. Multiple agents with different optimization functions catch errors the others miss. Executor performs tasks, Validator cross-checks against ground truth, Critic provides final review before returning to the user.

What Breaks

Single agents cannot self-validate. When an agent books a hotel, it claims "Success: Booked Grand Plaza Hotel" even if the API returned an error or the hotel doesn't exist in the database. The agent optimizes for appearing successful, not verifying outcomes.

The Fix

Deploy multiple agents with different roles: Executor performs tasks, Validator cross-checks against ground truth, Critic provides final review. Agents share context and hand off control autonomously when their role completes.

Measured impact: Multi-agent catches errors single agent misses (e.g., booking non-existent hotels).

What to Tell Your AI Assistant

"Build a multi-agent system using Strands Swarm with 3 agents:
1. Executor: Books hotels, searches flights
2. Validator: Cross-checks operations against database
3. Critic: Final review before returning to user

Agents share context via swarm.context. Use autonomous handoffs. 
Agents decide when to hand off based on task completion."

When to Use

High-stakes operations (financial, medical, legal)
Tasks where "appears successful" differs from "actually successful"
Complex workflows with multiple verification points

Full details: How to Stop AI Agents from Hallucinating Silently with Multi-Agent Validation

Pattern 6: Memory Pointer Pattern

What Is the Memory Pointer Pattern?

The Memory Pointer Pattern stores large data outside the LLM context and passes short references instead. When tools return 200KB+ logs or 1000-row database results, passing them directly causes silent truncation. Memory pointers store data in agent.state, return a pointer to the LLM, and provide separate tools that resolve pointers to access full data. IBM reduced 20M tokens to 1,234 tokens using this pattern.

What Breaks

Context window overflow occurs when tools return more data than the LLM can process (200KB+ logs, 1000-row database results). The agent doesn't crash. It silently truncates data, loses context, produces incomplete answers.

Real production case (IBM Materials Science):

Before: 20 million tokens, workflow failed
After: 1,234 tokens, workflow succeeded

The Fix

Store large data in agent.state, pass short references to the LLM. Tools return pointers like "logs-app-server". Subsequent tools resolve pointers to access full data. LLM only sees: "Data stored as logs-app-server. Use analyze_errors(pointer)."

Data in context reduced: 214KB → 52 bytes

What to Tell Your AI Assistant

"Build a log analysis agent using Memory Pointer Pattern. When 
fetch_logs returns >20KB:
1. Store in agent.state with unique pointer ID
2. Return to LLM: 'Data stored as logs-{app}. Use analyze_logs(pointer).'
3. Implement analyze_logs(pointer) that resolves from agent.state

Never pass large data directly to LLM context."

When to Use

Tools returning large outputs (logs, database queries, files)
Workflows with multiple processing steps on same large data
Cost-sensitive applications

Full details: AI Context Window Overflow: Memory Pointer Fix

Pattern 7: Async HandleId Pattern

What Is the Async HandleId Pattern?

The async handleId pattern prevents slow external APIs from blocking your agent. When an API takes 30+ seconds, synchronous calls freeze the entire agent. Async handleId returns a job ID immediately, letting the agent continue with other tasks. A separate check_status tool polls for results when ready. This eliminates 424 timeout errors and keeps agents responsive.

What Breaks

External APIs that take 30+ seconds block the agent indefinitely. No other tools can run. After ~7 seconds, many implementations return 424 timeout errors, freezing the workflow.

The Fix

Tools return immediately with a job ID instead of waiting. Agent stores handleId and continues. Separate check_status(job_id) tool polls for results asynchronously.

Measured impact:

Before: 18-second API blocks agent, 424 timeout
After: Tool returns <1 second, agent polls when ready

What to Tell Your AI Assistant

"Build an agent with async handleId pattern for slow APIs:

1. start_analysis(data): Submit job, return job_id immediately
2. check_status(job_id): Poll for results

Agent calls start_analysis, stores job_id, continues with other 
tasks, calls check_status when ready. Do not implement blocking calls."

When to Use

External APIs with >5 second response times
Batch processing (video analysis, large transforms)
Any system outside your control

Full details: Fix MCP Timeouts: Async HandleId Pattern

Pattern 8: DebounceHook + Explicit States

What Prevents Reasoning Loops?

Reasoning loops occur when ambiguous tool feedback ("more may be available") signals that retrying might help. Two fixes work together: explicit terminal states (return SUCCESS/FAILED so the LLM knows when to stop) and DebounceHook (framework hook that blocks duplicate calls). Production tests showed explicit states reduced calls from 14 to 2, while DebounceHook provides a safety net for edge cases.

What Breaks

Agents loop calling the same tool repeatedly without progress. Ambiguous feedback like "Found 3 results. More may be available" signals that retrying might help. The agent loops indefinitely.

Real production case: 847 reasoning steps at $47/minute, no answer delivered.

The Fix (Two Parts)

Part A: Explicit Terminal States

Return clear SUCCESS or FAILED states. Change "More may be available" to "SUCCESS: Found all 3 matching flights."

Part B: DebounceHook Safety Net

Framework hook tracks recent tool calls. When same (tool_name, input) appears twice, block third attempt.

Measured impact (travel booking demo):

Ambiguous feedback: 14 calls
Explicit SUCCESS: 2 calls (7x reduction)
DebounceHook: 12 calls (2 blocked)

What to Tell Your AI Assistant

"Build a travel agent with anti-loop protection:

1. All tools return explicit states:
   - SUCCESS: [clear completion]
   - FAILED: [clear error]
   Never return 'more may be available'

2. Implement DebounceHook:
   - Track last 3 tool calls as (tool_name, input)
   - If same pair appears twice, block third attempt
   - Return 'BLOCKED: Duplicate detected'

This prevents loops without manual retry limits."

When to Use

Agents prone to retry loops (search, API aggregators)
Cost-sensitive applications where unbounded retries are expensive
Production systems where infinite loops create availability risk

Full details: How to Prevent AI Agent Reasoning Loops from Wasting Tokens

Example: Generic vs Informed Prompting

❌ Generic Prompt

"Build a customer support agent that searches our knowledge base 
and books appointments"

What you get:

Vector RAG (may hallucinate on structured queries)
Synchronous booking API (may timeout)
No validation (can book invalid times)
Single agent (claims success even when booking fails)

Result: Works in demo, fails in production.

✅ Informed Prompt

"Build a customer support agent:

Knowledge Base:
- Use Neo4j GraphRAG for structured queries (pricing, features)
- Use vector RAG only for semantic search (descriptions)

Booking:
- Validate appointment_time > now() before booking
- Use async handleId for booking API (10+ seconds)
- Return explicit states: SUCCESS / FAILED

Validation:
- Multi-agent: Executor (search/book), Validator (cross-check), 
  Critic (final review)
- Use Strands Swarm for autonomous handoffs

Loop Prevention:
- DebounceHook blocks duplicate calls
- All tools return terminal states"

What you get:

GraphRAG prevents hallucinations
Async prevents timeouts
Guardrails prevent invalid bookings
Multi-agent catches false successes
DebounceHook prevents loops

Result: Production-ready agent.

Common Mistakes

Mistake 1: Assuming Defaults Are Best Practices

Problem: "Build a production agent" assumes the assistant knows what production means.

Fix: Specify patterns: "Use GraphRAG, guardrails, async patterns."

Mistake 2: Relying Only on Prompts for Validation

Problem: "Make sure max_guests < 10" in system prompt gets ignored under pressure.

Fix: "Implement BeforeToolCallEvent hook that validates and cancels invalid calls."

Mistake 3: Not Recognizing When Patterns Apply

Problem: Agent works in demo, breaks on edge cases.

Fix: Know the 8 patterns. When you see hallucinations, timeouts, or loops, you'll recognize which pattern solves it.

My Thoughts

AI coding assistants will keep improving at generating working code. But working code and production-ready architecture remain different targets.

The gap isn't the assistant's capability. It's the prompt's specificity.

Next Steps

If You're Building a New Agent

Identify which patterns apply (use symptom checklist)
Specify patterns in your prompt
Verify generated code implements them
Test failure modes (timeouts, invalid inputs, non-existent data)

If You're Debugging an Existing Agent

Identify the symptom (hallucinations, loops, timeouts, rule violations)
Map symptom to pattern (see Step 1: Recognize the Symptom)
Prompt your assistant to add the pattern: "Add DebounceHook to prevent loops"
Verify fix with targeted tests

Learn More (Full Implementation Guides)

Each pattern has a complete guide with working code:

GraphRAG: RAG vs GraphRAG: When Agents Hallucinate Answers
Semantic Tool Selection: Reduce Agent Errors and Token Costs
Neurosymbolic Guardrails: AI Agent Guardrails: Rules That LLMs Cannot Bypass
Runtime Guardrails (Steering): Runtime Guardrails for AI Agents: Steer, Don't Block
Multi-Agent Validation: Stop AI Agents from Hallucinating Silently
Memory Pointers: AI Context Window Overflow: Memory Pointer Fix
Async HandleId: Fix MCP Timeouts: Async HandleId Pattern
DebounceHook: Prevent AI Agent Reasoning Loops

Complete series:

Gracias!

🇻🇪🇨🇱 Dev.to Linkedin GitHub Twitter Instagram Youtube

Elizabeth Fuentes L

I help developers build production-ready AI applications through hands-on tutorials and open-source projects.

Why This Matters

The 8 Failure Patterns

What Are These 8 Patterns?

Measured Impact from Production

Pattern 1: GraphRAG for Precise Queries

What Is GraphRAG?

What Breaks

The Fix

What to Tell Your AI Assistant

When to Use

Pattern 2: Semantic Tool Selection

What Is Semantic Tool Selection?

What Breaks

The Fix

What to Tell Your AI Assistant

When to Use

Pattern 3: Neurosymbolic Guardrails (Block)

What Are Neurosymbolic Guardrails?

What Breaks

The Fix

What to Tell Your AI Assistant

When to Use

Pattern 4: Runtime Guardrails (Steer, Don't Block)

What Is Steering vs Blocking?

What Breaks

The Fix

What to Tell Your AI Assistant

When to Use

Pattern 5: Multi-Agent Validation

What Is Multi-Agent Validation?

What Breaks

The Fix

What to Tell Your AI Assistant

When to Use

Pattern 6: Memory Pointer Pattern

What Is the Memory Pointer Pattern?

What Breaks

The Fix

What to Tell Your AI Assistant

When to Use

Pattern 7: Async HandleId Pattern

What Is the Async HandleId Pattern?

What Breaks

The Fix

What to Tell Your AI Assistant

When to Use

Pattern 8: DebounceHook + Explicit States

What Prevents Reasoning Loops?

What Breaks

The Fix (Two Parts)

What to Tell Your AI Assistant

When to Use

Example: Generic vs Informed Prompting

❌ Generic Prompt

✅ Informed Prompt

Common Mistakes

Mistake 1: Assuming Defaults Are Best Practices

Mistake 2: Relying Only on Prompts for Validation

Mistake 3: Not Recognizing When Patterns Apply

My Thoughts

Next Steps

If You're Building a New Agent

If You're Debugging an Existing Agent

Learn More (Full Implementation Guides)

Elizabeth Fuentes LFollow

Elizabeth Fuentes L