DEV Community: Gs. Sanjana

Your AI Agent Doesn't Need to Be Smarter. It Needs to Be Idempotent

Gs. Sanjana — Sat, 27 Jun 2026 19:22:08 +0000

Most of the failures I see in production AI agents aren't reasoning failures. The model picks the right tool, fills in the right arguments, and makes a perfectly sensible decision. Then the agent charges the customer twice.

The reason is mundane and has nothing to do with intelligence. A write-capable agent — one that can send an email, create a ticket, move money, or update a database — lives inside the same unreliable network as any other distributed system. Requests time out. Connections drop after the server already committed the write but before the response came back. An orchestration framework retries a step that looked like it failed but didn't. And because the agent is a loop that re-plans on every observation, a single ambiguous outcome can send it down the path of just trying the action again.

In a read-only agent, a retry is free. In a write-capable agent, a retry is a second irreversible action in the real world. That asymmetry is the whole game, and the fix is older than LLMs: idempotency.

The shape of the bug

Here's the sequence that bites teams over and over. The agent calls send_invoice. The downstream service receives it, creates the invoice, and starts sending the response. Somewhere on the way back, the connection dies. From the agent's point of view, the call failed — it got a timeout, not a 200. So the agent, doing exactly what a resilient system is supposed to do, retries. Now there are two invoices.

Notice that nothing here is the model's fault. You could swap in a smarter model and the bug gets worse, because a more capable agent is more aggressive about recovering from apparent failures. The intelligence layer and the reliability layer are different problems, and you cannot prompt your way out of a network partition.

Borrow the pattern that already won

Payments infrastructure solved this years ago, and the solution is worth copying wholesale. Stripe's API lets a client attach an Idempotency-Key header to any POST request. Per Stripe's API reference, the server saves the status code and body of the first request made for a given key, and subsequent requests with the same key return that same stored result — even if the original was a failure. Stripe recommends a V4 UUID or another random string with enough entropy to avoid collisions, and notes that keys can be pruned automatically once they're at least 24 hours old.

The mechanism is simple, but the insight is the part to internalize: the safety guarantee lives at the boundary, keyed on the caller's stated intent, not on the model's judgment. The agent is allowed to be flaky. The boundary is what makes flakiness safe.

For an agent, the only adaptation is where the key comes from. A human checkout flow generates one fresh key per user click. An agent has no clicks — so you derive the key from the content of the intended action. Same logical action, same key, every time, even across retries and process restarts.

A minimal, working guard

Here's the entire idea in runnable Python. An IdempotentStore wraps the side-effecting action; the key is a hash of the tool name plus its parameters, so a retried call collapses onto the original.

import hashlib, json

class IdempotentStore:
    def __init__(self):
        self._results = {}
        self.side_effects = 0  # times the REAL action ran

    def run(self, key, action, *args):
        if key in self._results:
            return self._results[key], "replayed"   # no downstream call
        result = action(*args)                       # the irreversible part
        self.side_effects += 1
        self._results[key] = result
        return result, "executed"

def intent_key(tool_name, params):
    payload = json.dumps({"tool": tool_name, "params": params}, sort_keys=True)
    return hashlib.sha256(payload.encode()).hexdigest()[:16]

Drive it with an agent that retries the same logical charge three times:

store = IdempotentStore()
params = {"customer": "cus_42", "cents": 4999}
key = intent_key("charge_customer", params)

for attempt in range(3):
    result, mode = store.run(key, charge_customer,
                             params["customer"], params["cents"])
    print(f"attempt {attempt+1}: mode={mode}")

Running it prints executed once and replayed twice, and the downstream system records exactly one charge. The agent still thinks it acted three times — and that's fine. Its job is to decide; the store's job is to make sure deciding twice doesn't cost twice.

In real systems you'd back _results with Redis or a Postgres table (with a unique constraint on the key, so even two concurrent workers race safely), set a TTL, and store enough of the response to replay it faithfully. The structure stays the same.

Choosing the key is the real design work

The hash-the-params trick has a sharp edge worth naming. Your key is only as good as your definition of "the same action."

If two genuinely distinct actions hash to the same key, you've created a false duplicate and the second one silently no-ops — a send_reminder that quietly never sends. If two retries of the same action hash to different keys — because you included a timestamp, a freshly generated request ID, or the model rephrased a free-text field — your guard does nothing and the double-write sails through. The model's nondeterminism makes this trap easy to fall into: ask an LLM to "email the customer about their late payment" twice and you may get two different message bodies, and therefore two different keys.

The fix is to key on the stable part of the intent — the customer ID, the invoice ID, the logical operation — and deliberately exclude anything the model might reword or anything that varies per call. Treat the key as a first-class part of your tool's contract, designed by you, not as an incidental hash of whatever arguments happened to show up.

The takeaway

Before you reach for a bigger model, a longer prompt, or another layer of self-reflection, ask a cheaper question: if my agent does this exact action twice, what breaks? For every write-capable tool, the answer should be "nothing," and the way you get there is an idempotency key derived from intent and enforced at the boundary.

Reliability in agents isn't mostly about making better decisions. It's about making the cost of a repeated decision zero. Get that right and you can let the agent be as flaky as the network it lives on — which it will be, whether you plan for it or not.

Everyone asks if AI will replace engineers. After a year of coding with it daily, that's the wrong question.

Gs. Sanjana — Fri, 26 Jun 2026 07:47:45 +0000

I've used AI coding tools every single working day for about a year. Not for demos — for real, shipped, production work. Long enough to get past both the hype and the backlash. Here's the honest version, the one I'd tell a friend over coffee.

What actually got faster

The boring stuff. Boilerplate, glue code, the first draft of a function, translating an idea into a framework I half-know, writing the test I was going to skip. The "I know exactly what I want, I just don't want to type it all" tasks collapsed from an afternoon to a few minutes.

That part is real, and it's not small. A surprising amount of engineering is typing things you already understand.

What did NOT get faster (and got a little harder)

Knowing what to build. Deciding the tradeoff. Holding the whole system in your head. Figuring out why the thing is actually slow. Saying "no" to the clever solution and picking the boring one that survives.

If anything, AI made these more important, because it removes the friction that used to slow you down before you'd thought it through. It'll happily generate 200 lines of confidently wrong code. The bottleneck moved from writing to judging.

The skill that quietly became everything: review

A year ago, my main skill was writing code. Now my main skill is reading it — fast, skeptically, deciding in seconds whether to keep it, fix it, or throw it out. AI made me a senior reviewer of a very fast, very eager junior who never gets tired and never gets offended.

The engineers getting the most out of this aren't the ones who trust it most. They're the ones who trust it least by default, and verify quickly.

The honest failure mode

The trap isn't that AI writes bad code. It's that it writes plausible code, and plausible is exactly what slips through when you're tired. The worst bugs I've seen this year weren't typos — they were confident, well-formatted, completely reasonable-looking lines that were subtly wrong. You only catch those if you still understand the thing yourself.

So my one rule: never ship code you couldn't have written and can't fully explain. The day you do, you've stopped being the engineer and started being the rubber stamp.

So — replacement?

Wrong question. It's not replacing engineers; it's deleting the gap between knowing and doing. That rewards people who know things and have taste, and punishes hand-waving. The leverage is enormous if you bring judgment, and dangerous if you bring it instead of judgment.

I'm more productive than I've ever been. I'm also reading more carefully than I ever have. Both are true, and I don't think that's a coincidence.

If you code with AI daily: what's the one thing you refuse to let it do for you?

Blocklists Leak, Allowlists Hold: a tiny benchmark for stopping hijacked AI agents

Gs. Sanjana — Fri, 26 Jun 2026 07:38:39 +0000

TL;DR: Once an AI agent can act, a single injected instruction can make it delete data, move money, or leak secrets. I built a tiny, reproducible benchmark for the layer that actually executes actions. An undefended agent let through 100% of attacks; a blocklist still leaked 20%; a default-deny allowlist blocked 100% with zero false positives.

The shift that changes everything

Everyone's talking about smarter models. The bigger change is quieter: agents that don't just answer but act — send the email, issue the refund, run the query. The day an agent can act, a wrong answer becomes a wrong action.

That's why prompt injection and agent goal-hijacking sit at the top of the OWASP risk lists for agentic systems. Hostile instructions hide in something the agent reads — a doc, a tool result, a web page — and the agent, trying to help, follows them.

A deliberately pessimistic question

Most defenses try to stop the model from being fooled. I asked the opposite: assume it already has been — does the layer that executes actions stop the harmful ones?

I compared three postures:

Undefended — the agent runs whatever it's driven to do.
Blocklist — block obviously-dangerous capabilities + a payment cap.
Default-deny allowlist — only explicitly safe actions auto-run; everything else pauses for a human.

Results

Posture	Attack success rate	Benign blocked
Undefended	100%	0%
Blocklist	20%	0%
Default-deny allowlist	0%	0%

The blocklist leaked on the sneaky, low-impact attacks — "post a summary to this public link," "turn off the audit log." Nothing looked dangerous, so they slipped through. The allowlist caught them, because they weren't on the safe list.

The lesson fits on a sticky note: you can't enumerate every dangerous action, but you can enumerate the safe ones. Blocklists fail open on novelty; allowlists fail closed.

The whole idea is a few lines:

def needs_confirmation(action, threshold=1000):
    return action["impact"] >= threshold

def run(action, execute, confirm):
    if needs_confirmation(action) and not confirm(action):
        return "HELD for review"   # waits for a human
    return execute(action)

Honest limitations: it's a small, hand-built suite, and it assumes the worst case (the model is already hijacked), so it measures only the gating layer — not whether your agent resists injection in the first place. It complements bigger dynamic benchmarks like AgentDojo rather than replacing them.

The benchmark and a small guardrail library are open source (MIT) — repo link in the comments.

How do you draw the line between what your agents do on their own and what waits for a human? Curious what others are doing.