DEV Community

Cover image for Runtime Guardrails for AI Agents - Steer, Don't Block
Elizabeth Fuentes L for AWS

Posted on • Originally published at builder.aws.com

Runtime Guardrails for AI Agents - Steer, Don't Block

Most agent guardrails do one thing: block. The agent hits a rule, the workflow stops, and the user has to intervene. Agent Control adds a second option:
steer — the agent receives corrective guidance, self-corrects, and completes the task without human intervention.

Agent guardrails today are binary — allow or deny. When an agent violates a policy, the typical response is to block the action and surface an error. This works for hard constraints (PCI compliance, regulatory blocks), but it creates friction for rules where the agent could fix the problem itself: adjust a parameter, redact sensitive data, or reformat an output.

Agent Control is an open-source runtime control plane that introduces steer controls alongside traditional blocks. Steer controls return corrective guidance via Guide() — the agent retries with the fix applied and completes the task. Rules live on a server, not in code — update them via API or dashboard without redeploying your agent.

This post shows how Agent Control works using a booking demo built with Strands Agents. We compare two approaches on the same scenario: hooks that block vs Agent Control that steers. Hooks and Agent Control are complementary — use hooks for hard blocks, steer for corrections.


Series Overview

This is a bonus post in the series on stopping AI agent hallucinations — added after the launch of Agent Control:

Part 1: RAG vs GraphRAG: When Agents Hallucinate Answers — Relationship-aware knowledge graphs preventing hallucinations in aggregations and precise queries
Part 2: Reduce Agent Errors and Token Costs with Semantic Tool Selection — Vector-based tool filtering for accurate tool selection
Part 3: AI Agent Guardrails: Rules That LLMs Cannot Bypass — Symbolic rules the LLM cannot bypass
Bonus Part 3.2: Agent Control (this post) — Steer instead of block
Part 4: Multi-Agent Validation — Agent teams detecting hallucinations before damage.


The Problem: Blocking Stops the Flow

Strands Hooks enforce rules at the tool level. When the agent calls book_hotel(guests=15) and the maximum is 10, the hook sets cancel_tool and the agent receives a blocked message:

class MaxGuestsHook(HookProvider):
    def check(self, event: BeforeToolCallEvent) -> None:
        guests = event.tool_use["input"].get("guests", 1)
        if guests > 10:
            event.cancel_tool = f"BLOCKED: {guests} guests exceeds maximum of 10"
Enter fullscreen mode Exit fullscreen mode

The agent then tells the user:

"The Grand Hotel has a maximum capacity of 10 guests. Would you like to adjust the number of guests?"

The workflow stops. The user must respond. For a booking assistant handling hundreds of requests, every blocked operation is a friction point.


The Solution: Steer Instead of Block

Agent Control is an open-source runtime control plane that evaluates agent inputs and outputs against server-managed policies. It integrates with Strands as a Plugin — the same extension point as Hooks, but with two key differences:

  1. Rules live on a server — change them via API or dashboard without touching agent code
  2. Steer controls return Guide() instead of blocking — the agent retries with corrective guidance

Hooks (Block) vs Agent Control (Self-Correct) comparison

How Steer Works

When the LLM generates output mentioning "15 guests", the AgentControlSteeringHandler evaluates it against server-defined controls:

  1. LLM generates: "I will book Grand Hotel for 15 guests from May 1 to May 3"
  2. Agent Control evaluates LLM output → regex matches "15 guest"
  3. Steer control fires → returns Guide("reduce to 10, inform the user")
  4. LLM retries with guidance → calls book_hotel(guests=10)
  5. Booking completes → user is informed about the adjustment

Agent Control steer flow: User Request → LLM → Agent Control server evaluates → Self-Correct → Final Response

The agent responds:

"The maximum capacity for the Grand Hotel is 10 guests, so I have adjusted the booking accordingly. Booking ID: BK002."

No user intervention. No retry. The workflow completed.


Implementation: Same Query, Two Approaches

The tools are identical — clean booking functions with no validation logic:

from strands import tool

@tool
def book_hotel(hotel: str, check_in: str, check_out: str, guests: int = 1) -> str:
    """Book a hotel room."""
    return f"SUCCESS: Booking {booking_id}{hotel}, {guests} guests, {check_in} to {check_out}"

@tool
def process_payment(amount: float, booking_id: str) -> str:
    """Process payment for a booking."""
    return f"SUCCESS: Processed ${amount:.2f} for {booking_id}"

@tool
def confirm_booking(booking_id: str) -> str:
    """Confirm a booking."""
    return f"SUCCESS: Confirmed {booking_id}"
Enter fullscreen mode Exit fullscreen mode

The tools do NOT enforce the max-guests rule. That is the guardrail layer's job.

Test 1 — Hooks (Block)

from strands import Agent
from strands.hooks import HookProvider, HookRegistry, BeforeToolCallEvent

class MaxGuestsHook(HookProvider):
    def __init__(self):
        self.blocked = 0
    def register_hooks(self, registry: HookRegistry) -> None:
        registry.add_callback(BeforeToolCallEvent, self.check)
    def check(self, event: BeforeToolCallEvent) -> None:
        if event.tool_use["name"] != "book_hotel":
            return
        guests = event.tool_use["input"].get("guests", 1)
        if guests > 10:
            self.blocked += 1
            event.cancel_tool = f"BLOCKED: {guests} guests exceeds maximum of 10"

agent = Agent(model=MODEL, tools=[book_hotel, process_payment, confirm_booking],
              hooks=[MaxGuestsHook()])
agent("Book Grand Hotel for 15 guests from 2026-05-01 to 2026-05-03")
Enter fullscreen mode Exit fullscreen mode

Result:

Hook blocked: 1 call(s)
Agent: "The Grand Hotel has a maximum capacity of 10 guests.
        Would you like to adjust?"
Outcome: BLOCKED — user must intervene
Enter fullscreen mode Exit fullscreen mode

Test 2 — Agent Control (Steer)

Controls are defined on the Agent Control server — not in code:

# setup_controls.py — run once
control = {
    "name": "steer-max-guests",
    "definition": {
        "scope": {"step_types": ["llm"], "stages": ["post"]},
        "selector": {"path": "output"},
        "evaluator": {"name": "regex", "config": {"pattern": r"(1[1-9]|[2-9]\d)\s*guest"}},
        "action": {
            "decision": "steer",
            "message": "Guest count exceeds maximum of 10",
            "steering_context": {
                "message": "Reduce the guest count to 10, retry the booking, "
                           "and inform the user that the maximum capacity is 10."
            }
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

The agent uses AgentControlPlugin + AgentControlSteeringHandler — both as Strands Plugins:

from agent_control.integrations.strands import AgentControlPlugin, AgentControlSteeringHandler
import agent_control

agent_control.init(agent_name="booking-guardrails-demo")

plugin = AgentControlPlugin(agent_name="booking-guardrails-demo",
                            event_control_list=[BeforeToolCallEvent, AfterToolCallEvent])
steering = AgentControlSteeringHandler(agent_name="booking-guardrails-demo")

agent = Agent(model=MODEL, tools=[book_hotel, process_payment, confirm_booking],
              plugins=[plugin, steering])
agent("Book Grand Hotel for 15 guests from 2026-05-01 to 2026-05-03")
Enter fullscreen mode Exit fullscreen mode

Result:

Steered: 1 time(s)
Agent: "The maximum capacity for the Grand Hotel is 10 guests,
        so I have adjusted the booking accordingly. Booking ID: BK002."
Outcome: SELF-CORRECTED — booking completed
Enter fullscreen mode Exit fullscreen mode

Results

Same query. Same tools. Same model. Only the guardrail changes.

Approach Time Outcome
Test 1 — Hooks (cancel_tool) ~4s BLOCKED — agent asks user to adjust
Test 2 — Agent Control (steer) ~6s Self-corrected — booking completed with 10 guests

Both enforce the same rule. The difference is what happens when the rule is violated — not which approach is "better." Hooks are faster and simpler (pure Python, no server). Agent Control adds latency (steer → retry) but completes the workflow without user intervention. Choose based on the rule, not the technology.


When to Use Each Approach

Approach Best for
Hooks (block) Rules that MUST hard-block — no workaround allowed (e.g., payment before confirmation, PCI compliance)
Agent Control (steer) Rules where the agent CAN self-correct — adjust parameters, redact PII, fix formatting
Agent Control (deny) Same as hooks but managed on a server — change rules without redeploying code

These are complementary, not competing. Hooks are simpler (pure Python, no server, no latency overhead). Agent Control is more flexible (server-managed, steer + deny, runtime updates without redeploying). Many production systems use both:

  • Hooks for compliance rules that must never be bypassed — payment verification, regulatory blocks, PII in tool parameters
  • Agent Control (deny) for the same hard blocks but managed centrally across multiple agents — update via dashboard, no redeploy
  • Agent Control (steer) for soft rules where self-correction is preferable — capacity adjustments, PII redaction in outputs, date formatting

Two Ways to Define Controls

Mode Best for
Server (this demo) Teams, production, dashboard management — controls live on the Agent Control server
Local YAML Quick prototyping — controls defined in controls.yaml, no server needed

See the Agent Control docs for details on both modes.


Why Strands Makes This Simple

Both Hooks and Agent Control integrate with a single line change:

# Hooks — block violations (existing Strands API):
agent = Agent(tools=[...], hooks=[MaxGuestsHook()])

# Agent Control — steer violations (plugin API):
agent = Agent(tools=[...], plugins=[AgentControlPlugin(...), AgentControlSteeringHandler(...)])
Enter fullscreen mode Exit fullscreen mode

No custom orchestration. No retry logic. Strands handles the lifecycle — hooks intercept before tool calls, steering evaluates after model output, and Guide() triggers automatic retry with corrective guidance.

Strands Hooks Documentation
Strands Steering Documentation
Agent Control Plugin for Strands


Key Takeaways

  • Hooks block violations — effective but stops the workflow and requires user intervention
  • Agent Control steers violations — the agent self-corrects and completes the task
  • Steer controls return Guide() with corrective instructions — the LLM retries with the fix applied
  • Controls live on a server — update rules via API or dashboard without touching agent code
  • Both approaches enforce the same rule (max 10 guests) — the difference is what happens when the rule is violated
  • Hooks for hard blocks, Agent Control for self-correction — use both when needed

Run It Yourself

git clone https://github.com/aws-samples/sample-why-agents-fail
cd stop-ai-agent-hallucinations/05-agent-control-demo

# Start Agent Control server (see setup instructions)
# https://github.com/agentcontrol/agent-control

# Install and run
uv venv && uv pip install -r requirements.txt
uv run setup_controls.py
uv run test_hooks_vs_control.py
Enter fullscreen mode Exit fullscreen mode

You can swap to any provider supported by Strands — see Strands Model Providers for configuration.


References

Research

Strands Agents

Agent Control

Code


Gracias!

🇻🇪🇨🇱 Dev.to Linkedin GitHub Twitter Instagram Youtube

Top comments (2)

Collapse
 
ensamblador profile image
ensamblador

This is like having a perfect balance between flexibility and control needed in this new type of applications. 😎

Collapse
 
camila_hinojosa_anez profile image
Camila Hinojosa Anez

Wise content 🚀🙌