Elizabeth Fuentes L for AWS

Posted on Mar 10 • Edited on Apr 15 • Originally published at builder.aws.com

AI Agent Guardrails: Rules That LLMs Cannot Bypass

#ai #python #agents #tutorial

AI agents can hallucinate operation success even when they violate business rules. They confirm bookings without payment verification, accept invalid parameters such as 15 guests when the maximum is 10, or ignore required prerequisites. Prompt engineering alone cannot prevent these errors.

Consider a travel booking agent that receives the query: "Confirm my hotel booking." The agent calls confirm_booking(booking_id="BK001") and returns "SUCCESS: Confirmed BK001" — even though no payment was ever verified. The docstring says "Payment must be verified first." The LLM read it and ignored it.

This is the hallucination pattern that symbolic guardrails solve. Using Strands Agents hooks, you can enforce business rules at the framework level — before the tool executes. The result: 3/3 invalid operations blocked, with zero changes to the tools or prompts, by adding a single hook.

Research from ATA: Autonomous Trustworthy Agents (2024) calls this the neurosymbolic approach: combining neural LLM reasoning with deterministic symbolic rules that cannot be overridden.

Note: This post uses Strands Agents to demonstrate the pattern. Similar hook-based interception exists in LangGraph (node guards), AutoGen (reply functions), and other agent frameworks.

This Series: 4 Production Techniques

Part 1: RAG vs GraphRAG: When Agents Hallucinate Answers — Relationship-aware knowledge graphs preventing hallucinations in aggregations and precise queries

Part 2: Reduce Agent Errors and Token Costs with Semantic Tool Selection — Vector-based tool filtering for accurate tool selection

Part 3 (This Post): AI Agent Guardrails — Symbolic rules for verifiable decisions that LLMs cannot bypass

Bonus Part 3.2: Runtime Guardrails for AI Agents — Steer, Don't Block Open-source runtime controls that guide agents to self-correct violations instead of failing the workflow.

Part 4: How to Stop AI Agents from Hallucinating Silently with Multi-Agent Validation - Agent teams detecting hallucinations before damage

Code repository: sample-why-agents-fail

In Part 2, semantic tool selection reduced tool confusion by filtering tools before the LLM sees them. But agents can still hallucinate operation success — confirming bookings without payment, ignoring guest limits, or bypassing required validation steps. Filtering tools doesn't stop the LLM from misusing the ones it receives.

The Problem: Prompts Are Suggestions, Not Constraints

Research from ATA (2024) identifies three hallucination patterns that prompt engineering cannot prevent:

Parameter errors: The agent calls book_hotel(guests=15) despite "Maximum 10 guests" in the docstring
Completeness errors: The agent executes bookings without required payment verification
Tool bypass behavior: The agent confirms success without calling mandatory validation tools

The root cause is architectural: prompts are text that the large language model (LLM) interprets. Business rules embedded in docstrings or system prompts become suggestions, not constraints. The model decides whether to follow them on every call.

Prerequisites

This post assumes familiarity with Python and LLM agent tool use. If you are new to Strands Agents, start with the Strands Agents documentation.

cd 04-neurosymbolic-demo
pip install -r requirements.txt

The demo uses Strands Agents with OpenAI GPT-4o-mini by default. You can configure any model provider that Strands supports — see Strands Model Providers.

The Solution: Neurosymbolic Validation with Strands Hooks

Strands Agents provides BeforeToolCallEvent — a hook that intercepts every tool call before execution. You can attach a HookProvider to the agent that validates symbolic rules and cancels the call if any rule fails:

The key line in the hook is event.cancel_tool. When set, Strands replaces the tool result with that message before the LLM sees anything. The tool never executes. The LLM receives a cancellation it cannot override.

Neural + Symbolic: The LLM handles natural language understanding and tool selection. The hook handles deterministic constraint enforcement. Neither replaces the other.

The Demo: Two Agents, Same Code

We run the same 3 scenarios on two agents with identical tools, identical model, and identical prompts. The only difference is one line: hooks=[hook].

Agent	Hook	Guardrails
`baseline_agent`	None	❌ No validation
`guarded_agent`	`NeurosymbolicHook`	✅ Rules enforced before every tool call

Full notebook: test_neurosymbolic_hooks.ipynb

Setup: Rules, Hook, and Two Agents

Step 1 — Define symbolic rules (rules.py):

from dataclasses import dataclass
from typing import Callable

@dataclass
class Rule:
    name: str
    condition: Callable[[dict], bool]
    message: str

BOOKING_RULES = [
    Rule("max_guests",   lambda ctx: ctx.get("guests", 1) <= 10,     "Maximum 10 guests per booking"),
    Rule("valid_dates",  lambda ctx: ctx["check_in"] < ctx["check_out"], "Check-in must be before check-out"),
]

CONFIRMATION_RULES = [
    Rule("payment_before_confirm", lambda ctx: ctx.get("payment_verified", False),
         "Payment must be verified before confirmation"),
]

Rules are plain Python functions — deterministic, testable, and auditable independently of any agent.

Step 2 — Create the validation hook (test_neurosymbolic_hooks.py):

from strands.hooks import HookProvider, HookRegistry, BeforeToolCallEvent

class NeurosymbolicHook(HookProvider):

    def register_hooks(self, registry: HookRegistry) -> None:
        registry.add_callback(BeforeToolCallEvent, self.validate)

    def validate(self, event: BeforeToolCallEvent) -> None:
        tool_name = event.tool_use["name"]
        if tool_name not in self.rules:
            return
        context = self._build_context(tool_name, event.tool_use["input"])
        passed, violations = validate(self.rules[tool_name], context)
        if not passed:
            event.cancel_tool = f"BLOCKED: {', '.join(violations)}"

Step 3 — Define clean tools (no validation logic mixed in):

@tool
def book_hotel(hotel: str, check_in: str, check_out: str, guests: int = 1) -> str:
    """Book a hotel room."""
    return f"SUCCESS: Booked {hotel} for {guests} guests, {check_in} to {check_out}"

@tool
def confirm_booking(booking_id: str) -> str:
    """Confirm a booking."""
    return f"SUCCESS: Confirmed {booking_id}"

Step 4 — Create both agents:

# Baseline: no hook, no validation
baseline_agent = Agent(tools=[book_hotel, process_payment, confirm_booking], model=MODEL)

# Guarded: hook intercepts every tool call
hook = NeurosymbolicHook(STATE)
guarded_agent = Agent(tools=[book_hotel, process_payment, confirm_booking], hooks=[hook], model=MODEL)

Same tools, same model, same prompts — the only difference is hooks=[hook].

Test 1: Confirm Booking Without Payment

Query: "Confirm booking BK001"

The booking exists. The payment does not. The confirm_booking tool docstring says "Payment must be verified first."

Baseline agent calls the tool and returns success. The docstring was context, not a constraint.

Guarded agent — the hook evaluates the CONFIRMATION_RULES before the tool executes, finds payment_verified = False, and cancels the call.

The LLM received BLOCKED: Payment must be verified before confirmation as the tool result. It cannot retry with different parameters — the rule is enforced at the framework level.

Test 2: Book Hotel Exceeding Guest Limit

Query: "Book Grand Hotel for 15 people from 2026-03-20 to 2026-03-25"

The BOOKING_RULES set a maximum of 10 guests. The agent extracts guests=15 from the query.

Baseline agent passes 15 guests to the tool, which returns success. The maximum in the docstring was ignored.

Guarded agent — the hook evaluates max_guests_check(15 <= 10) = False and cancels before the tool runs.

The validation happens before execution. There is no booking to roll back, no compensating transaction needed — the invalid operation never occurred.

Test 3: Valid Booking

Query: "Book Grand Hotel for 5 guests from 2026-03-20 to 2026-03-25"

All rules pass: guests=5 <= 10, dates are valid, advance booking window is met.

Both agents execute the booking successfully. The hook adds no friction to valid operations.

Results: 3/3 Invalid Operations Blocked

Scenario	Baseline Agent	Guarded Agent
Confirm booking without payment	❌ Executes — hallucination	✅ Blocked before execution
Book 15 guests (max 10)	❌ Executes — rule violated	✅ Blocked before execution
Valid booking (5 guests)	✅ Executes	✅ Executes

The guarded agent blocked 3/3 invalid operations and allowed 1/1 valid operations — zero false positives, zero false negatives.

The baseline agent has no mechanism to detect that it violated a business rule. It returns success with full confidence. Without the hook, the only thing standing between the agent and the invalid operation is the LLM's interpretation of a docstring.

Key Insight: Where Enforcement Happens

Prompt engineering — the LLM can ignore it:

system_prompt = """
IMPORTANT: Never confirm bookings without payment verification.
CRITICAL: Maximum 10 guests per booking.
"""
# The LLM reads this as context. It can hallucinate compliance.

❌ The LLM decides whether to follow this on every single call.

Strands Hook — enforced before the tool runs:

def validate(self, event: BeforeToolCallEvent) -> None:
    passed, violations = validate(self.rules[tool_name], context)
    if not passed:
        event.cancel_tool = f"BLOCKED: {', '.join(violations)}"
        # Tool never executes. LLM receives the cancellation.
        # There is no path to override this.

✅ The hook runs outside the LLM. The decision is not the LLM's to make.

The difference is architectural. Prompts are input to the LLM. Hooks are framework-level interceptors that run before the LLM sees the tool result.

Production Considerations

Advantages:

Verifiable constraints — rules are code, not instructions
Centralized — one hook validates all tools; no validation logic scattered across tool definitions
You can test rules independently of any agent or LLM call
Rule violations produce explicit, loggable events with tool name, parameters, and reason

Challenges:

Rules must be explicitly defined for each operation you want to protect
Does not handle fuzzy or probabilistic logic — rules are boolean
Edge cases require explicit handling in rule conditions
Rules need maintenance as business logic evolves

Best practices:

Define hooks for critical, high-stakes operations such as bookings, payments, and cancellations
Log all rule violations with tool name, parameters, and reason for auditing
Test rules thoroughly and independently of the agent
Combine guardrails with semantic tool selection (Part 2) and multi-agent validation (Part 4) for layered protection

What's Next

Symbolic guardrails block rule violations at the tool level. But a single agent still has no check on its own reasoning across multiple steps — it can hallucinate that a previous step succeeded, misinterpret a tool result, or reach a confident wrong conclusion from a chain of correct tool calls.

Part 4: Multi-Agent Validation shows how a Swarm of specialized agents (Executor → Validator → Critic) provides explicit verdicts on every response — catching hallucinations that no tool-level guardrail can see.

Key Takeaways

Prompts are suggestions: The LLM interprets docstrings and system prompts — it can hallucinate compliance with any instruction
Hooks are enforcement: BeforeToolCallEvent intercepts tool calls before execution at the framework level — the LLM cannot override a cancelled tool
3/3 invalid operations blocked: Zero changes to tools or prompts, one hook added
Clean separation: Tools handle business operations; hooks handle constraint enforcement
Auditable by design: Rule violations are explicit Python conditions — testable, loggable, and traceable
One hook, all tools: A single NeurosymbolicHook validates every tool call in one place

References

ATA: Autonomous Trustworthy Agents
Enhancing LLMs through Neuro-Symbolic Integration
Mitigating LLM Hallucinations: Meta-Analysis — Neurosymbolic approaches show superior performance
Strands Agents Hooks Documentation

Code: GitHub

Gracias!

🇻🇪🇨🇱 Dev.to Linkedin GitHub Twitter Instagram Youtube

Top comments (2)

Agntable • Mar 17

Great breakdown — the architectural distinction between prompts as suggestions vs hooks as enforcement is the key insight that most agent tutorials completely miss

Camila Hinojosa Anez • Mar 10

This is crucial