Most AI agents don’t reliably follow directions, and that’s one of the biggest reasons they never make it from POC to production.
This is how deploying agents usually plays out: you write clear instructions in your prompt, test against every scenario you can think of, and ship it. Then the agent skips steps, drifts from your guidelines, or invents behavior you didn't anticipate. So you add more detail, more constraints, more explicit directions.
The prompt is getting huge now but you’re sure you’ve captured all the rules. You deploy again. Same problem. Eventually, you hit a wall and give up.
I ran into this firsthand trying to create a simple AI assistant to help me write. I gave it samples of my writing style, told it to write like me, and it did start off okay. But after a few turns it drifted back into generic AI-speak. I'm talking em dashes everywhere, staccato sentences for dramatic effect, and that weird "It's not about X, it's about Y" framing that sounds profound but actually says nothing. By the end of a long session, the output usually sounds nothing like me.
This example makes the problem obvious because you can read the output and immediately tell something’s off. But the same thing happens in more serious scenarios, like compliance checks, customer support flows, or multi-step workflows where the stakes are higher.
What's actually happening is that as conversations get longer, the model pays less attention to earlier instructions.
Prompt engineering helps, but it can only take you so far. What you need is a feedback loop that catches drift and corrects it before the response ever reaches the user.
The Agent Buddy System
Instead of trying to make one agent behave perfectly, the solution was to introduce a second agent to the system. One does the work, and the other checks it. That’s what I’ve been calling the agent buddy system.
The main agent handles the task: writing, reasoning, calling tools, whatever it needs to do. The buddy sits alongside it, watching the output. If the agent skips a step, tries to misuse a tool, or drifts from the defined rules, the buddy steps in and helps get things back on track.
The idea is simple: don’t rely on the model to always follow instructions. Assume it will drift, and build something that corrects it when it does.
This is essentially using an LLM as a judge. The evaluator model inspects the output from the worker model and decides whether it meets the criteria. If it does, the response goes through. If not, it sends guidance and the agent can try again.
It turns out that having two models that disagree with each other is safer than having one model that just does whatever it wants.
You can build this pattern yourself, but I used the Strands Agents SDK because it already supports this kind of feedback loop through a feature called steering.
Steering lets you inject just-in-time guidance into the agent’s execution instead of front-loading everything into a massive prompt and hoping for the best.
Under the hood, Strands steering works through hooks in the agent’s lifecycle. You can intercept tool calls before they execute to run custom validations, or evaluate the model’s response after it’s generated to check things like tone, format, or adherence to the prompt.
The steering agent intercepts the call and returns one of three actions: Proceed (accept), Guide (reject with feedback for retry), or Interrupt (escalate to a human).
Building a Writing Buddy
To fix my AI writing problem, I built a steering handler that checks every response against a style guide with examples of my actual writing. If the output doesn’t sound like me, the handler catches it and asks for a rewrite before I ever see it.
In Strands, this means creating a SteeringHandler and attaching it to your agent as a plugin.
For my use case, I only needed to evaluate the final output, so I used steer_after_model() to inspect each response and decide whether to accept it or send it back with feedback.
Here’s my VoiceSteeringHandler:
class VoiceSteeringHandler(SteeringHandler):
"""Evaluates writing output against a style guide using an LLM judge.
Intercepts model responses via steer_after_model and uses a separate
steering agent to check for style violations. If a violation is found,
it guides the agent to rewrite with targeted feedback.
"""
def __init__(self, style_guide: str, max_retries: int = 3):
super().__init__(context_providers=[])
self.style_guide = style_guide
self.max_retries = max_retries
self.retry_count = 0
async def steer_after_model(
self, *, agent: "Agent", message: Message, stop_reason: StopReason, **kwargs: Any
):
"""Evaluate model output against the style guide."""
print("\n[STEERING] Evaluating model output...")
text = " ".join(
block.get("text", "") for block in message.get("content", [])
)
if self.retry_count >= self.max_retries:
self.retry_count = 0
return Proceed(reason="Max retries reached, accepting output")
# Use a separate steering agent as an LLM judge
steering_agent = Agent(
system_prompt=f"""You evaluate writing against a style guide.
Catch clear violations, not nitpicks.
STYLE GUIDE:
{self.style_guide}
REJECT for: banned words/phrases from the style guide, em dashes,
"It's not X. It's Y." reframing, obvious marketing tone, or meta-commentary.
APPROVE if: tone is developer-to-developer with no banned words/phrases/patterns.
When in doubt, APPROVE.
Respond with APPROVE or REJECT: [quote the violation].""",
model=agent.model,
callback_handler=None,
)
result = str(steering_agent(f"Evaluate this text:\n\n{text}"))
if "REJECT:" in result.upper():
self.retry_count += 1
feedback = result.split("REJECT:", 1)[-1].strip()
return Guide(
reason=f"Fix this issue: {feedback[:300]}. "
"Only fix the cited issue. Output only the content, nothing else."
)
self.retry_count = 0
return Proceed(reason="Output approved by steering agent")
Then to attach it to your main agent, you use a plugin like this:
model = BedrockModel(
model_id="us.anthropic.claude-sonnet-4-20250514-v1:0",
region_name="us-east-1",
)
return Agent(
model=model,
system_prompt=f"""You are a writing assistant that writes in a specific voice.
Follow every rule in the style guide below. Output only the requested writing.
Never add meta-commentary or questions like "Would you like me to adjust?"
STYLE GUIDE:
{style_guide}""",
plugins=VoiceSteeringHandler(style_guide=style_guide),
)
When the steering agent sees the output doesn't match, the handler returns Guide with specific feedback. The agent discards its response and tries again, knowing exactly what went wrong. After max_retries attempts, it lets the response through rather than looping forever.
The evaluator prompt checks for voice match against your examples, but also flags AI vocabulary (words like "crucial," "delve," "tapestry"), structural patterns (em dashes, pseudo-profound reframing), and other tells that make text sound machine-generated. You give it paragraphs from your actual writing, and it asks "does this new text sound like these examples?" It's essentially a style linter powered by an LLM.
That’s a judgment call, and this is where steering really shines. Instead of trying to build complicated, deterministic evaluation logic, you let a model make that call and provide targeted feedback.
Does It Work?
Yes. Here’s what I saw in my own testing before getting into larger-scale results.
I ran a small evaluation: 5 multi-turn writing sessions where a simulated user iteratively refines a piece, repeated 5 times each using Claude Sonnet 4.5. That's the kind of back-and-forth that happens in real writing workflows, and it's where drift becomes noticeable. The baseline voice adherence averaged 25% by the end of the sessions, but the steered version held at 100%.
For single-turn prompts with more capable models, both performed about the same for a small evaluation dataset, because larger models are already pretty good at following style guides on their own. The difference shows up in the longer sessions where drift compounds, or when weaker models are used.
That's a modest eval set, so take the exact numbers directionally rather than as gospel. But the pattern consistently showed unsteered sessions degraded noticeably after a few turns, while steered sessions stayed on voice throughout.
The more compelling evidence comes from Clare Liguori, Senior Principal Software Engineer at AWS, who ran a similar evaluation at a much larger scale. She tested five approaches to guiding agent behavior on a library book renewal agent across 3,000 runs.
- Simple prompt instructions reached 82.5% accuracy, meaning roughly one in five interactions failed
- Agent SOPs hit 99.8%, but at 3x the token cost
- Graph-based workflows reached 80.8%, often failing outside predefined paths
- Steering hit 100% across 600 runs while using 66% fewer input tokens than SOPs and 47% fewer output tokens than workflows
The most common failure without steering was skipping the book status check before renewing (43% of failures), followed by missing the confirmation message (40%). These are exactly the kinds of steps models deprioritize as context grows.
Things To Consider
This pattern works well, but there are a few things you should consider.
Latency
Each steering intervention adds another model call. If the handler returns Guide, the agent has to regenerate with feedback, which can mean two or three round trips for a single response. Once you add in tool calls the latency becomes a real factor.
That’s fine for background tasks or workflows where accuracy matters more than speed. But it’s the wrong tradeoff for real-time applications where users expect quick responses and the stakes are low.
Token costs
Tokens do add up, but the picture is more nuanced than you might expect.
Steering uses more tokens than simple prompt instructions because you’re sending feedback back to the agent when it strays. But compared to approaches that actually achieve high accuracy, like SOPs, steering is often more efficient.
You should reach for steering when a single prompt isn't enough, but try using the single prompt approach first.
Steering prompt quality
The quality of your steering prompts directly impacts performance.
If your handler gives vague feedback, the agent can get stuck retrying without improving. Set retry limits, make your Guide feedback specific, and if the same correction keeps firing, fix the prompt instead of increasing retries.
And remember, you're using a model to judge another model. That means they can share the same blind spots. If both the worker and the evaluator miss the same kind of mistake, steering won't catch it.
Try using two different models, and for high-stakes use cases, pair this with deterministic checks where you can.
When not to use steering
Steering assumes you have a clear definition of "correct." That works for style guides, compliance rules, and structured workflows. It doesn't work as well for creative tasks where you actually want the model to surprise you because steering will pull it back toward whatever your evaluator thinks is right. And if your criteria can be expressed as deterministic checks (regex, schema validation, rule engines), maybe skip steering. It's slower, costs more, and adds uncertainty where you don't need it.
Beyond Writing Assistants
Reliable agents come from the systems you build around them.
Steering applies anywhere an agent needs consistent behavior over time. Customer service agents maintaining tone across dozens of interactions, code review bots enforcing your team's conventions, or compliance workflows where skipping a step has real consequences.
The pattern is the same: evaluate the output, provide guidance, retry if needed. You just swap the evaluator criteria.
Clare Liguori’s post walks through her full evaluation of the library book renewal agent. The steering documentation covers the full API.
Some agents need a buddy to keep them on track. Steering gives you that.

Top comments (1)
Ran into the exact same instruction drift problem building a multi-step workflow. The buddy/supervisor pattern works but the latency cost of a second model call per turn adds up fast -- have you measured the overhead in production?