Siddhant Jain

Posted on Apr 24 • Originally published at keelstack.me

Your Agent Retried. The Email Sent Twice.

#typescript #javascript #webdev #ai

A practical guide to idempotency, budget guardrails, and risk gates for TypeScript AI agents — with real code.

There's a class of production bug that doesn't throw an exception, doesn't show up in your error logs, and doesn't alert anyone. Your AI agent just quietly does the same thing twice.

The email sends twice. The Stripe charge fires twice. The database record duplicates. No stack trace. No crash. Just a confused user and a support ticket you don't want to explain.

This isn't a rare edge case. It's what happens when you take agent frameworks — tools designed for reliability through retries — and connect them to side effects that aren't designed for retries.

Let me show you exactly what goes wrong, and exactly what fixes it.

Why Every Agent Framework Retries

LangGraph, Vercel AI SDK, Mastra, and OpenAI Agents SDK all have the same design assumption baked in: LLM calls and tool calls can fail transiently, so the framework should retry them.

This is a reasonable assumption. Networks timeout. APIs rate-limit. A 30-second agent workflow failing at step 28 because of a 500ms blip would be maddening if you couldn't retry. LangGraph has a RetryPolicy object you pass directly to nodes. Vercel AI SDK defaults to maxRetries: 2 (3 total attempts). Mastra handles retries at the workflow runner level.

The problem is not that they retry. The problem is that they retry without knowing what your side effect already did.

The framework sees a timeout. It doesn't know whether your tool call completed before the timeout or during it. So it retries the whole thing. And if your tool calls resend.emails.send() or stripe.charges.create(), those run again — because nothing told them not to.

This is a well-documented issue. LangGraph's own GitHub discussions have open threads about tools being called repeatedly until hitting the recursion limit. An arXiv study surveying 12 major agent frameworks found that no framework enforces exactly-once execution at the tool-call level — they all delegate that responsibility to you.

The Real Cost of Getting This Wrong

In November 2025, a team's LangGraph research pipeline had four agents working together. Two of them — an Analyzer and a Verifier — got stuck ping-ponging requests to each other. No step limits. No cost ceiling. No alert. The loop ran for 11 days while the team assumed the growing API bill was organic growth. Final cost: $47,000.

That's an extreme case. But the everyday version of this is much more common: a user clicks "Submit" twice, the network is slow, your agent fires the tool twice, and a welcome email hits their inbox twice. Or a Stripe charge creates a duplicate. It's not catastrophic — it's just the kind of bug that erodes user trust and creates support burden.

The industry is starting to take this seriously. A LinkedIn post from an AI engineering lead put it clearly: "Most AI pipelines are one bad retry away from a silent infinite loop." The community thread on preventing duplicate tool execution has hundreds of engineers describing the exact same failure mode. OpenAI's own community forum flags it as a major issue for create operations.

What Doesn't Fix It

Before getting to the solution, it's worth naming the things that look like solutions but aren't:

Prompt engineering ("don't call the same tool twice") is not enforceable. Agents re-plan. They retry. They don't remember what they said in a previous prompt injection scenario.

Reducing maxRetries doesn't help either. Duplicate calls can happen on the first attempt if the response is slow or ambiguous. Fewer retries just means more failures, not fewer duplicates.

Client-side deduplication — tracking calls in memory on the agent side — breaks across crashes, timeouts, parallel subagents, and worker restarts.

The fix has to live at the tool boundary, outside the agent's reasoning loop, in something the agent cannot bypass.

The Fix: Application-Level Guardrails with `@keelstack/guard`

@keelstack/guard is a small TypeScript package — 180 lines, zero runtime dependencies — that wraps any async action with three production safety primitives. You don't configure anything to start. You just wrap the call.

Primitive 1: Idempotency

The core idea is simple: give every logical operation a stable key. The first time that key is seen, the action runs and the result is stored. If the same key comes in again — from a retry, a parallel agent, a workflow resume — the stored result is returned and the action is not run again.

import { guard } from '@keelstack/guard';

const result = await guard({
  key: `send-welcome:${userId}`,       // stable, unique per operation
  action: () => resend.emails.send({
    to: user.email,
    subject: 'Welcome to the app!',
  }),
});

console.log(result.status);     // 'executed' | 'replayed'
console.log(result.fromCache);  // false | true

When the agent retries with the same key, result.status comes back as 'replayed'. The email service never receives a second call.

The key is doing the real work here. A good idempotency key is stable and scoped to the specific logical operation — stripe-charge:${invoiceId} is right, op-${Date.now()} is wrong because it changes on every retry. The README has a full guide on key construction .

By default, results are stored in-memory with a 24-hour TTL. For production deployments with multiple instances, you bring your own Redis-backed Ledger:

import type { Ledger, LedgerEntry } from '@keelstack/guard';
import { createClient } from 'redis';

const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();

const redisLedger: Ledger = {
  async get(key) {
    const raw = await redis.get(`guard:${key}`);
    return raw ? (JSON.parse(raw) as LedgerEntry) : undefined;
  },
  async set(key, entry) {
    const ttl = Math.max(0, Math.floor((entry.expiresAt - Date.now()) / 1000));
    await redis.set(`guard:${key}`, JSON.stringify(entry), { EX: ttl || undefined });
  },
  async delete(key) { await redis.del(`guard:${key}`); },
  async list() {
    const keys = await redis.keys('guard:*');
    const entries = await Promise.all(keys.map(k => redis.get(k)));
    return entries.flatMap(e => e ? [JSON.parse(e) as LedgerEntry] : []);
  },
  async prune() { return 0; },
};

const result = await guard({ key: 'my-op', action: myAction, ledger: redisLedger });

A first-party @keelstack/guard-redis adapter is on the roadmap — but the interface is simple enough to wire yourself today .

Primitive 2: Budget Enforcement

This is the $47,000 problem solved at the tool level. You set a per-user spend limit. Before every action, the guard checks current spend against the limit. If the limit is hit, the action is blocked and returns status: 'blocked:budget' — the agent never makes the API call.

const result = await guard({
  key: `ai-call:${userId}:${requestId}`,
  action: () => openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: prompt }],
  }),
  budget: {
    id: userId,
    limitUsd: 2.00,           // hard cap: $2 per user per day
    warnAt: [0.5, 0.8],       // warn callbacks at 50% and 80%
    onWarn: ({ percentUsed, id }) => {
      console.warn(`User ${id} at ${(percentUsed * 100).toFixed(0)}% of budget`);
    },
  },
  extractCost: (res) => {
    const tokens = res.usage?.total_tokens ?? 0;
    return (tokens / 1_000_000) * 15;   // gpt-4o pricing
  },
});

if (result.status === 'blocked:budget') {
  return Response.json({
    error: 'Daily AI budget exceeded',
    spent: result.budgetInfo?.spent,
    limit: result.budgetInfo?.limit,
  }, { status: 429 });
}

The extractCost function is how you tell the guard what each call costs. It could be token-based, flat-fee, or any formula that makes sense for your API. The guard accumulates spend per id and enforces the hard stop before the next call fires .

Primitive 3: Risk Gate

Some actions shouldn't run without explicit intent — deleting a record, cancelling a subscription, sending a bulk communication. The risk gate classifies actions as safe, reversible, or irreversible, and applies a policy: allow, log, warn, or block.

const result = await guard({
  key: `delete-account:${userId}`,
  action: () => db.users.delete({ where: { id: userId } }),
  risk: {
    level: 'irreversible',
    policy: 'block',
    onRisk: (info) => {
      auditLog.write({
        key: info.key,
        level: info.level,
        blocked: info.blocked,
      });
    },
  },
});

if (result.status === 'blocked:risk') {
  return Response.json({ error: 'Action blocked by risk policy' }, { status: 403 });
}

Default policies if you don't specify: safe → allow, reversible → log, irreversible → warn. The onRisk callback fires regardless of whether the action was blocked — useful for audit trails .

The Fourth Primitive: Failure Handling

FailureConfig controls what happens when an action throws. The default (policy: 'retry') rethrows the error and leaves the key uncached, so future attempts with the same key can try again. The compensate policy lets you run a cleanup callback before rethrowing — useful when you need to undo a partial side effect:

const result = await guard({
  key: `provision-resource:${tenantId}`,
  action: () => cloudProvider.createInstance(config),
  failure: {
    policy: 'compensate',
    onError: async ({ key, error }) => {
      await alertOpsTeam(`Failed to provision ${key}: ${error.message}`);
    },
  },
});

Putting It All Together

These primitives compose. A single guard() call can enforce all three simultaneously:

const result = await guard({
  key: `stripe-charge:${invoiceId}`,
  action: () => stripe.charges.create({
    amount: amountCents,
    currency: 'usd',
    customer: customerId,
  }),
  budget: {
    id: tenantId,
    limitUsd: 500,
    warnAt: [0.7, 0.9],
  },
  extractCost: () => amountCents / 100,  // actual charge amount
  risk: {
    level: 'irreversible',
    policy: 'log',
    onRisk: (info) => auditLog.write(info),
  },
});

One wrapper. The charge cannot fire twice (idempotency). It cannot exceed your tenant spend ceiling (budget). Every execution is logged to your audit trail (risk gate).

Works With Any Framework

Because guard() wraps any async () => T, there's no framework coupling. You use it inside whatever tool definition your framework expects:

Vercel AI SDK:

import { tool } from 'ai';
import { guard } from '@keelstack/guard';
import { z } from 'zod';

const sendEmailTool = tool({
  description: 'Send a confirmation email',
  parameters: z.object({ userId: z.string(), subject: z.string() }),
  execute: async ({ userId, subject }) => {
    return guard({
      key: `send-email:${userId}:${subject}`,
      action: () => resend.emails.send({ to: await getEmail(userId), subject }),
    });
  },
});

LangGraph.js:

import { tool } from '@langchain/core/tools';
import { guard } from '@keelstack/guard';
import { z } from 'zod';

const chargeUserTool = tool(
  async ({ amountUsd, invoiceId }) => {
    const result = await guard({
      key: `stripe-charge:${invoiceId}`,
      action: () => stripe.charges.create({ amount: amountUsd * 100, currency: 'usd' }),
      risk: { level: 'irreversible', policy: 'log' },
    });
    return result.value;
  },
  {
    name: 'charge_user',
    schema: z.object({ amountUsd: z.number(), invoiceId: z.string() }),
  }
);

Honest Limitations

The default in-memory ledger is process-local. If you run two Node.js instances, they don't share deduplication state — you need a shared Redis backend for that. Simultaneous same-key calls within a single process are lock-joined, but cross-process race safety depends on your ledger implementation's atomicity guarantees .

The package doesn't do loop detection (identifying a repeating sequence of different tool calls). It doesn't do semantic caching. It solves the specific, well-defined problem of duplicate side effects from retried tool calls — and it solves that problem cleanly.

Install It Today

npm install @keelstack/guard

Node ≥ 20. TypeScript ≥ 5 (optional but recommended). Zero runtime dependencies. MIT licensed .

The full source is on GitHub — five files, no magic, no hidden deps.

Coming next: first-party @keelstack/guard-redis adapter, OpenTelemetry spans per guard call, and a hosted dashboard to see every replayed duplicate and budget block per user and per agent. If you want early access and want to shape what the dashboard looks like, join the beta waitlist.

If you've hit this problem in production — or if you have a horror story about an agent that wouldn't stop calling the same endpoint — I'd genuinely like to hear it. The comments are open.

DEV Community

Your Agent Retried. The Email Sent Twice.

Why Every Agent Framework Retries

The Real Cost of Getting This Wrong

What Doesn't Fix It

The Fix: Application-Level Guardrails with `@keelstack/guard`

Primitive 1: Idempotency

Primitive 2: Budget Enforcement

Primitive 3: Risk Gate

The Fourth Primitive: Failure Handling

Putting It All Together

Works With Any Framework

Honest Limitations

Install It Today

Top comments (0)

Why Every Agent Framework Retries

The Real Cost of Getting This Wrong

What Doesn't Fix It

The Fix: Application-Level Guardrails with @keelstack/guard

Primitive 1: Idempotency

Primitive 2: Budget Enforcement

Primitive 3: Risk Gate

The Fourth Primitive: Failure Handling

Putting It All Together

Works With Any Framework

Honest Limitations

Install It Today

The Fix: Application-Level Guardrails with `@keelstack/guard`