<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Anthony Zender</title>
    <description>The latest articles on DEV Community by Anthony Zender (@azender1).</description>
    <link>https://dev.to/azender1</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3681962%2Ff6210fe3-edb0-45ef-9a66-e8323a8ff7df.png</url>
      <title>DEV Community: Anthony Zender</title>
      <link>https://dev.to/azender1</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/azender1"/>
    <language>en</language>
    <item>
      <title>The Real AI Agent Failure Mode Is Uncertain Completion</title>
      <dc:creator>Anthony Zender</dc:creator>
      <pubDate>Sat, 28 Mar 2026 14:12:46 +0000</pubDate>
      <link>https://dev.to/azender1/the-real-ai-agent-failure-mode-is-uncertain-completion-447n</link>
      <guid>https://dev.to/azender1/the-real-ai-agent-failure-mode-is-uncertain-completion-447n</guid>
      <description>&lt;p&gt;The Real AI Agent Failure Mode Is Uncertain Completion&lt;/p&gt;

&lt;p&gt;A lot of AI agent discussion focuses on the wrong failure modes.&lt;/p&gt;

&lt;p&gt;People talk about:&lt;/p&gt;

&lt;p&gt;hallucinations&lt;br&gt;
prompt injection&lt;br&gt;
tool misuse&lt;br&gt;
runaway loops&lt;br&gt;
bad reasoning&lt;/p&gt;

&lt;p&gt;Those are real.&lt;/p&gt;

&lt;p&gt;But once an agent starts calling tools that affect the outside world, a different class of failure becomes much more dangerous:&lt;/p&gt;

&lt;p&gt;uncertain completion&lt;/p&gt;

&lt;p&gt;That is the moment where the system cannot confidently answer:&lt;/p&gt;

&lt;p&gt;“Did this action already happen?”&lt;/p&gt;

&lt;p&gt;And once that question becomes ambiguous, retries get dangerous very fast.&lt;/p&gt;

&lt;p&gt;What uncertain completion actually looks like&lt;/p&gt;

&lt;p&gt;A common real-world path looks like this:&lt;/p&gt;

&lt;p&gt;agent decides to call send_payment()&lt;br&gt;
→ tool sends the payment request&lt;br&gt;
→ timeout / crash / disconnect / lost response&lt;br&gt;
→ caller does not know if it succeeded&lt;br&gt;
→ retry happens&lt;br&gt;
→ payment may be sent again&lt;/p&gt;

&lt;p&gt;The same thing shows up with:&lt;/p&gt;

&lt;p&gt;order creation&lt;br&gt;
booking flows&lt;br&gt;
email sends&lt;br&gt;
CRM mutations&lt;br&gt;
support ticket creation&lt;br&gt;
browser / UI automation&lt;br&gt;
webhook-triggered workflows&lt;/p&gt;

&lt;p&gt;The model may have made the correct decision.&lt;/p&gt;

&lt;p&gt;The failure is that the system has no durable way to prove whether the side effect already happened.&lt;/p&gt;

&lt;p&gt;This is not mainly a prompting problem&lt;/p&gt;

&lt;p&gt;The agent is often not “being stupid.”&lt;/p&gt;

&lt;p&gt;The system is simply missing a clean execution boundary.&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;p&gt;the same logical action can be attempted multiple times&lt;br&gt;
the caller cannot distinguish “attempted” from “completed”&lt;br&gt;
retries are forced to guess&lt;/p&gt;

&lt;p&gt;And “guessing” is exactly how you get:&lt;/p&gt;

&lt;p&gt;duplicate payments&lt;br&gt;
duplicate emails&lt;br&gt;
duplicate orders&lt;br&gt;
duplicate API mutations&lt;br&gt;
duplicate irreversible actions&lt;br&gt;
The hidden trap: “we logged the attempt”&lt;/p&gt;

&lt;p&gt;A lot of systems record that they tried to do something.&lt;/p&gt;

&lt;p&gt;That is not the same as recording that it completed safely.&lt;/p&gt;

&lt;p&gt;This is where the distinction matters:&lt;/p&gt;

&lt;p&gt;State visibility&lt;/p&gt;

&lt;p&gt;Can your system durably see:&lt;/p&gt;

&lt;p&gt;what was requested&lt;br&gt;
what was claimed&lt;br&gt;
what actually completed&lt;br&gt;
what result should be returned on replay&lt;br&gt;
Result recovery&lt;/p&gt;

&lt;p&gt;If the side effect happened but the response was lost, can the system reconstruct what should happen next without re-executing the side effect?&lt;/p&gt;

&lt;p&gt;That second part is where many systems break.&lt;/p&gt;

&lt;p&gt;Because once the answer becomes:&lt;/p&gt;

&lt;p&gt;“we’re not sure, so retry it”&lt;/p&gt;

&lt;p&gt;you are already in dangerous territory.&lt;/p&gt;

&lt;p&gt;API idempotency helps — but it is not enough&lt;/p&gt;

&lt;p&gt;A common response is:&lt;/p&gt;

&lt;p&gt;“Just use idempotency keys.”&lt;/p&gt;

&lt;p&gt;That is often correct.&lt;/p&gt;

&lt;p&gt;And if the downstream API supports strong idempotency semantics, you should absolutely use them.&lt;/p&gt;

&lt;p&gt;But that still leaves hard cases:&lt;/p&gt;

&lt;p&gt;the downstream API does not support idempotency&lt;br&gt;
the key is not stable across retries&lt;br&gt;
the first call may have succeeded but the caller cannot prove it&lt;br&gt;
the side effect is happening in a browser / UI / desktop automation context&lt;br&gt;
the external system gives weak or ambiguous feedback&lt;/p&gt;

&lt;p&gt;In those cases, the problem is no longer just API-level idempotency.&lt;/p&gt;

&lt;p&gt;It becomes:&lt;/p&gt;

&lt;p&gt;execution-layer safety&lt;br&gt;
The important split: intent vs execution&lt;/p&gt;

&lt;p&gt;One of the cleanest ways to think about this is:&lt;/p&gt;

&lt;p&gt;the agent should not directly own irreversible side effects&lt;/p&gt;

&lt;p&gt;Instead, there should be a separation between:&lt;/p&gt;

&lt;p&gt;Agent intent&lt;/p&gt;

&lt;p&gt;“I think we should do X”&lt;/p&gt;

&lt;p&gt;and&lt;/p&gt;

&lt;p&gt;Execution&lt;/p&gt;

&lt;p&gt;“X is now allowed to happen exactly once”&lt;/p&gt;

&lt;p&gt;That is a very important boundary.&lt;/p&gt;

&lt;p&gt;Because once the system separates:&lt;/p&gt;

&lt;p&gt;decision&lt;br&gt;
validation&lt;br&gt;
execution&lt;br&gt;
receipt / replay&lt;/p&gt;

&lt;p&gt;…then retries stop being so dangerous.&lt;/p&gt;

&lt;p&gt;A better pattern: proposal → guard → execute&lt;/p&gt;

&lt;p&gt;A safer structure looks more like this:&lt;/p&gt;

&lt;p&gt;agent proposes action&lt;br&gt;
→ deterministic layer validates action&lt;br&gt;
→ execution guard checks durable receipt&lt;br&gt;
→ if already completed: return prior result&lt;br&gt;
→ else: execute once and persist receipt&lt;/p&gt;

&lt;p&gt;This is a very different mental model from:&lt;/p&gt;

&lt;p&gt;agent decides&lt;br&gt;
→ immediately call side-effecting tool&lt;/p&gt;

&lt;p&gt;That second pattern is where a lot of production agent systems get into trouble.&lt;/p&gt;

&lt;p&gt;The more irreversible the action, the thicker the boundary&lt;/p&gt;

&lt;p&gt;Not all tools should be treated equally.&lt;/p&gt;

&lt;p&gt;A useful mental model is:&lt;/p&gt;

&lt;p&gt;Safe tools&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;p&gt;search&lt;br&gt;
read_file&lt;br&gt;
summarize&lt;br&gt;
fetch_status&lt;/p&gt;

&lt;p&gt;These are usually fine to retry.&lt;/p&gt;

&lt;p&gt;Side-effecting tools&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;p&gt;send_email&lt;br&gt;
create_order&lt;br&gt;
create_ticket&lt;br&gt;
update_CRM&lt;/p&gt;

&lt;p&gt;These need an execution boundary.&lt;/p&gt;

&lt;p&gt;Irreversible / high-risk tools&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;p&gt;payment&lt;br&gt;
delete&lt;br&gt;
trade execution&lt;br&gt;
account mutation&lt;/p&gt;

&lt;p&gt;These need the strongest boundary:&lt;/p&gt;

&lt;p&gt;deterministic identity&lt;br&gt;
durable receipts&lt;br&gt;
replay-safe semantics&lt;br&gt;
often confirmation / policy checks&lt;/p&gt;

&lt;p&gt;The principle is simple:&lt;/p&gt;

&lt;p&gt;the more irreversible the action, the thicker the execution boundary should be&lt;br&gt;
What systems actually need&lt;/p&gt;

&lt;p&gt;In practice, most systems need some combination of:&lt;/p&gt;

&lt;p&gt;stable request / operation identity&lt;br&gt;
durable receipt storage&lt;br&gt;
replay-safe execution semantics&lt;br&gt;
result recovery&lt;br&gt;
explicit separation between “propose” and “execute”&lt;/p&gt;

&lt;p&gt;That can be implemented many ways.&lt;/p&gt;

&lt;p&gt;But the important thing is the architectural boundary itself.&lt;/p&gt;

&lt;p&gt;Because once a system can confidently answer:&lt;/p&gt;

&lt;p&gt;“yes, this already happened”&lt;/p&gt;

&lt;p&gt;then retries become much safer.&lt;/p&gt;

&lt;p&gt;Why this keeps showing up in agent systems&lt;/p&gt;

&lt;p&gt;Traditional systems already had this problem.&lt;/p&gt;

&lt;p&gt;Agents just make it more visible.&lt;/p&gt;

&lt;p&gt;Why?&lt;/p&gt;

&lt;p&gt;Because agents are:&lt;/p&gt;

&lt;p&gt;retry-heavy&lt;br&gt;
tool-using&lt;br&gt;
asynchronous&lt;br&gt;
failure-prone&lt;br&gt;
often layered on top of APIs that were never designed for autonomous replay&lt;/p&gt;

&lt;p&gt;So the moment an agent starts touching:&lt;/p&gt;

&lt;p&gt;payments&lt;br&gt;
orders&lt;br&gt;
emails&lt;br&gt;
browser actions&lt;br&gt;
external systems&lt;/p&gt;

&lt;p&gt;…uncertain completion becomes one of the most important production problems in the stack.&lt;/p&gt;

&lt;p&gt;Closing thought&lt;/p&gt;

&lt;p&gt;The scariest agent failure is often not:&lt;/p&gt;

&lt;p&gt;“the model made the wrong choice”&lt;/p&gt;

&lt;p&gt;It is:&lt;/p&gt;

&lt;p&gt;“the model made the right choice twice”&lt;/p&gt;

&lt;p&gt;And the reason that happens is usually not intelligence failure.&lt;/p&gt;

&lt;p&gt;It is:&lt;/p&gt;

&lt;p&gt;missing execution boundaries under uncertain completion&lt;br&gt;
Related&lt;/p&gt;

&lt;p&gt;I wrote a first piece on the execution-side pattern here:&lt;/p&gt;

&lt;p&gt;The Execution Guard Pattern for AI Agents&lt;br&gt;
&lt;a href="https://dev.to/azender1/the-execution-guard-pattern-for-ai-agents-23m9"&gt;https://dev.to/azender1/the-execution-guard-pattern-for-ai-agents-23m9&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And I’m also building a Python reference implementation around this idea:&lt;/p&gt;

&lt;p&gt;GitHub&lt;br&gt;
&lt;a href="https://github.com/azender1/SafeAgent" rel="noopener noreferrer"&gt;https://github.com/azender1/SafeAgent&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>backend</category>
      <category>architecture</category>
      <category>python</category>
    </item>
    <item>
      <title>The Execution Guard Pattern for AI Agents</title>
      <dc:creator>Anthony Zender</dc:creator>
      <pubDate>Sat, 28 Mar 2026 02:04:36 +0000</pubDate>
      <link>https://dev.to/azender1/the-execution-guard-pattern-for-ai-agents-23m9</link>
      <guid>https://dev.to/azender1/the-execution-guard-pattern-for-ai-agents-23m9</guid>
      <description>&lt;p&gt;AI agents don’t just think — they execute real-world actions.&lt;/p&gt;

&lt;p&gt;Payments. Trades. Emails. API calls.&lt;/p&gt;

&lt;p&gt;And under retries, timeouts, or crashes…&lt;/p&gt;

&lt;p&gt;they can execute the same action twice.&lt;/p&gt;

&lt;p&gt;Not because the model was wrong —&lt;br&gt;
because the system has no memory of execution.&lt;/p&gt;

&lt;p&gt;The hidden failure mode&lt;/p&gt;

&lt;p&gt;A typical failure path looks like this:&lt;/p&gt;

&lt;p&gt;agent decides to call tool&lt;br&gt;
→ tool executes side effect&lt;br&gt;
→ response is lost (timeout / crash / disconnect)&lt;br&gt;
→ system retries&lt;br&gt;
→ side effect executes again&lt;/p&gt;

&lt;p&gt;Now you have:&lt;/p&gt;

&lt;p&gt;duplicate payments&lt;br&gt;
duplicate trades&lt;br&gt;
duplicate emails&lt;br&gt;
duplicate API mutations&lt;/p&gt;

&lt;p&gt;Not because the decision was wrong —&lt;br&gt;
because the execution layer has no durable receipt.&lt;/p&gt;

&lt;p&gt;Retries are correct — and still dangerous&lt;/p&gt;

&lt;p&gt;Retries are necessary for reliability.&lt;/p&gt;

&lt;p&gt;But retries + irreversible side effects without a guard = replay risk.&lt;/p&gt;

&lt;p&gt;The system cannot confidently answer:&lt;/p&gt;

&lt;p&gt;“Did this action already happen?”&lt;/p&gt;

&lt;p&gt;So it does the only thing it can:&lt;/p&gt;

&lt;p&gt;→ tries again&lt;/p&gt;

&lt;p&gt;That’s fine for reads.&lt;/p&gt;

&lt;p&gt;It’s dangerous for writes.&lt;/p&gt;

&lt;p&gt;The Execution Guard Pattern&lt;/p&gt;

&lt;p&gt;The fix is not prompt engineering.&lt;/p&gt;

&lt;p&gt;It’s an execution boundary around side effects.&lt;/p&gt;

&lt;p&gt;Pattern:&lt;br&gt;
decision&lt;br&gt;
→ deterministic request_id&lt;br&gt;
→ execution guard&lt;br&gt;
   → if receipt exists → return prior result&lt;br&gt;
   → else → execute once → store receipt&lt;/p&gt;

&lt;p&gt;Instead of asking the model to “be careful,”&lt;br&gt;
the system itself becomes replay-safe.&lt;/p&gt;

&lt;p&gt;The four required properties&lt;/p&gt;

&lt;p&gt;For this pattern to work, you need four things:&lt;/p&gt;

&lt;p&gt;1) Deterministic request identity&lt;/p&gt;

&lt;p&gt;Every logical action must map to the same request_id across retries.&lt;/p&gt;

&lt;p&gt;If the same payment, email, trade, or tool call is retried, it must resolve to the same identity.&lt;/p&gt;

&lt;p&gt;2) Durable receipt storage&lt;/p&gt;

&lt;p&gt;You need a place to persist what happened.&lt;/p&gt;

&lt;p&gt;Postgres works well for this because it gives you:&lt;/p&gt;

&lt;p&gt;durable writes&lt;br&gt;
transactional boundaries&lt;br&gt;
strong uniqueness guarantees&lt;br&gt;
queryable auditability&lt;/p&gt;

&lt;p&gt;Without durable receipts, retries are guesswork.&lt;/p&gt;

&lt;p&gt;3) Atomic claim → execute → complete boundary&lt;/p&gt;

&lt;p&gt;The system needs a clear execution boundary:&lt;/p&gt;

&lt;p&gt;claim the operation&lt;br&gt;
execute the side effect once&lt;br&gt;
persist the result / receipt&lt;/p&gt;

&lt;p&gt;That boundary is what prevents:&lt;/p&gt;

&lt;p&gt;concurrent replays&lt;br&gt;
duplicate workers&lt;br&gt;
race-condition duplicates&lt;br&gt;
“two consumers did the same thing” bugs&lt;/p&gt;

&lt;p&gt;4) Replay returns the prior result&lt;/p&gt;

&lt;p&gt;If the same logical action comes in again,&lt;br&gt;
you should not execute it again.&lt;/p&gt;

&lt;p&gt;You should return the prior result.&lt;/p&gt;

&lt;p&gt;That turns:&lt;/p&gt;

&lt;p&gt;retries&lt;br&gt;
redelivery&lt;br&gt;
replay&lt;br&gt;
uncertain completion&lt;/p&gt;

&lt;p&gt;into:&lt;/p&gt;

&lt;p&gt;safe re-entry instead of duplicate side effects&lt;br&gt;
What this is NOT&lt;/p&gt;

&lt;p&gt;This is not:&lt;/p&gt;

&lt;p&gt;moderation&lt;br&gt;
prompt safety&lt;br&gt;
RBAC&lt;br&gt;
approval workflows&lt;br&gt;
hallucination prevention&lt;/p&gt;

&lt;p&gt;It solves one thing:&lt;/p&gt;

&lt;p&gt;“Did this irreversible action already happen?”&lt;/p&gt;

&lt;p&gt;That question shows up everywhere once agents or automations start calling real tools.&lt;/p&gt;

&lt;p&gt;Where this matters most&lt;/p&gt;

&lt;p&gt;This pattern matters anywhere your system causes real-world side effects:&lt;/p&gt;

&lt;p&gt;webhook handlers&lt;br&gt;
billing / payment flows&lt;br&gt;
async workers / queues&lt;br&gt;
workflow / automation systems&lt;br&gt;
AI agent tool calls&lt;br&gt;
external API mutations&lt;br&gt;
order / booking / ticket creation&lt;br&gt;
notifications and email sends&lt;/p&gt;

&lt;p&gt;In other words:&lt;/p&gt;

&lt;p&gt;anything that should happen once, even if the system retries&lt;br&gt;
Why this keeps showing up&lt;/p&gt;

&lt;p&gt;Modern systems are:&lt;/p&gt;

&lt;p&gt;distributed&lt;br&gt;
async&lt;br&gt;
retry-heavy&lt;br&gt;
failure-prone&lt;br&gt;
full of uncertain completion&lt;/p&gt;

&lt;p&gt;So “exactly once” does not happen naturally.&lt;/p&gt;

&lt;p&gt;You have to build it explicitly.&lt;/p&gt;

&lt;p&gt;And once you add:&lt;/p&gt;

&lt;p&gt;AI agents&lt;br&gt;
autonomous workflows&lt;br&gt;
tool-calling systems&lt;/p&gt;

&lt;p&gt;…the need for an execution boundary gets even sharper.&lt;/p&gt;

&lt;p&gt;Because now a model can repeatedly decide to invoke something that has real-world consequences.&lt;/p&gt;

&lt;p&gt;A practical implementation direction&lt;/p&gt;

&lt;p&gt;In many systems, this can be implemented with:&lt;/p&gt;

&lt;p&gt;a Postgres-backed receipt table&lt;br&gt;
a stable operation / request ID&lt;br&gt;
a guard layer around side-effecting functions&lt;/p&gt;

&lt;p&gt;That turns:&lt;/p&gt;

&lt;p&gt;unsafe retries&lt;/p&gt;

&lt;p&gt;into:&lt;/p&gt;

&lt;p&gt;safe replays&lt;/p&gt;

&lt;p&gt;This doesn’t require rewriting your whole system.&lt;/p&gt;

&lt;p&gt;It usually means identifying the small set of functions that can cause irreversible side effects and wrapping them with a durable execution boundary.&lt;/p&gt;

&lt;p&gt;That’s where the leverage is.&lt;/p&gt;

&lt;p&gt;Closing thought&lt;/p&gt;

&lt;p&gt;If an AI agent can call tools,&lt;br&gt;
it needs more than reasoning.&lt;/p&gt;

&lt;p&gt;It needs execution memory.&lt;/p&gt;

&lt;p&gt;Otherwise:&lt;/p&gt;

&lt;p&gt;retries will eventually execute something twice.&lt;br&gt;
Execution Risk Audit&lt;/p&gt;

&lt;p&gt;I’m currently looking at systems where retries, webhooks, workers, workflows, or AI agents can replay irreversible actions.&lt;/p&gt;

&lt;p&gt;If your system has paths where you can’t confidently answer:&lt;/p&gt;

&lt;p&gt;“Did this action already happen?”&lt;/p&gt;

&lt;p&gt;that’s exactly the kind of problem I’m focused on.&lt;/p&gt;

&lt;p&gt;Especially interested in:&lt;/p&gt;

&lt;p&gt;duplicate webhook execution&lt;br&gt;
retry-safe billing flows&lt;br&gt;
workflow steps with uncertain completion&lt;br&gt;
AI agents calling side-effecting tools&lt;/p&gt;

</description>
      <category>ai</category>
      <category>backend</category>
      <category>python</category>
      <category>postgres</category>
    </item>
  </channel>
</rss>
