connor gallic

Posted on Mar 23

Every AI Agent Disaster This Year Was a Write Without a Checkpoint

#discuss #opensource #showdev #ai

I run AI agents in production — Discord bots, email outreach, channel queues across multiple servers. More than once, a misconfigured loop or race condition caused the same message to fire twice to the same person. Same email, same channel, same queue.

Nobody died. No lawsuit. But every duplicate erodes a little trust. And when I looked at why it kept happening, the root cause was always the same: a write executed with nothing between the decision and the action.

Then I started paying attention to bigger teams hitting the exact same pattern.

It's happening everywhere

Air Canada had a chatbot that fabricated a bereavement fare refund policy out of thin air. A customer relied on it, got denied, and sued. Air Canada argued the chatbot was "a separate legal entity responsible for its own actions." The tribunal disagreed — the airline is liable for every message its bot sends, hallucinated or not.

Cursor's support bot "Sam" told users their subscriptions were limited to a single active session. That policy didn't exist. The AI invented it. Users canceled in protest before the co-founder could publicly apologize. Most of them didn't even know Sam wasn't human.

Replit's coding agent deleted an entire production database — 1,200+ records — despite instructions repeated in ALL CAPS eleven times not to make changes. Then it fabricated 4,000 fake replacement records and told the operator recovery wasn't possible. It was.

Amazon's Kiro agent was assigned a minor bug fix in AWS Cost Explorer. It decided the "most efficient path to a bug-free state" was to delete the entire production environment and rebuild from scratch. 13-hour outage.

Different companies, different agents, different scales. Same shape every time: the agent didn't malfunction. It did exactly what it was built to do. A human would have paused. The agent didn't hesitate.

The usual answer doesn't scale

The first response is always "just add human-in-the-loop." Right instinct, but in practice HITL goes one of two ways:

Ad-hoc — someone gets a Slack message, eyeballs it, types "looks good." No audit trail, no expiry, no record of what was approved or who approved it. Six months later when compliance asks, you're grepping Slack history.

Everything gets reviewed — works for about a week. Then the volume makes it unsustainable. The team rubber-stamps, or they stop using agents because the overhead killed the value.

The real gap is between those two extremes. Most agent writes fall into three buckets:

Auto-approve — a single support reply, a small data update, a cache refresh
Human review — a bulk import over 100 records, a financial transaction, a message containing certain terms
Always block — writes to production infra, refunds over a threshold, legal commitments

The problem is this logic usually lives scattered in application code. One agent has it, another doesn't. A new developer writes a new agent and skips it. Nothing is centralized, nothing is auditable.

So I pulled the guard logic out of my agents

I was copy-pasting the same write-check code into every integration I built. Same patterns — deduplicate, check record count, block certain terms, hold for review over a threshold. So I extracted it into a standalone layer.

Zehrava Gate is a write-path control plane. Before an agent executes a write, it submits an intent. Gate evaluates policy, optionally holds for human approval, and issues a signed execution order. Every decision is logged.

The policies are YAML — deterministic, no LLM in the loop:

id: support-reply
destinations: [zendesk.reply, intercom.reply]
block_if_terms: ["refund guaranteed", "full refund", "legal action"]
auto_approve_under: 1

id: crm-import
destinations: [salesforce.import, hubspot.contacts]
auto_approve_under: 100
require_approval_over: 100
expiry_minutes: 60

id: finance-high-risk
destinations: [stripe.refund, quickbooks.journal]
require_approval: always
expiry_minutes: 15

The integration is a few lines:

const { Gate } = require('zehrava-gate')
const gate = new Gate({ endpoint: 'http://localhost:4000', apiKey: 'gate_sk_...' })

const result = await gate.propose({
  payload:      'Thank you — your issue is resolved.',
  destination:  'zendesk.reply',
  policy:       'support-reply',
  recordCount:  1
})

if (result.status === 'blocked') throw new Error(result.blockReason)
if (result.status === 'pending_approval') return // wait for human
// approved — proceed

from zehrava_gate import Gate

gate = Gate(endpoint="http://localhost:4000", api_key="gate_sk_...")

result = gate.propose(
    payload="Thank you — your issue is resolved.",
    destination="zendesk.reply",
    policy="support-reply",
    record_count=1
)

A human writes the policy when they're thinking clearly. Gate enforces it mechanically. Same input, same output, every time.

"What if the agent just skips the SDK?"

That's the right question. The SDK is cooperative — it only works if the agent calls it. Fine for agents you build yourself. Not enough for agents you don't fully control.

Gate V3 closes that gap with a proxy. It sits in the network path between the agent and the destination API. One environment variable, no code changes:

export HTTP_PROXY=http://gate.internal:4001
export HTTPS_PROXY=http://gate.internal:4001

Every outbound HTTP call routes through Gate. The destination host maps to a policy. Approved requests get forwarded. Blocked requests get a 403 with the reason. Pending requests return a 202 and hold until a human approves.

── V2: cooperative ──────────────────────────────
Agent → SDK.propose() → Gate API → approved → Agent executes
                         ↑ optional — agent can skip

── V3: enforced ─────────────────────────────────
Agent → HTTP request → Gate Proxy → approved → forwards to destination
                                  → blocked  → 403, reason in response
                                  → pending  → 202, held for review

In vault mode, the agent never even sees production credentials. Gate fetches them from 1Password or HashiCorp Vault at execution time — after approval, for the approved intent only — then discards them from memory. A compromised agent has nothing to exfiltrate.

V2 gives you guardrails. V3 gives you a wall.

Why YAML and not another LLM?

The obvious design for a safety layer would be another AI evaluating the first AI's output. But that introduces the same unpredictability you're trying to remove. An LLM deciding "should this agent be allowed to send this email?" will occasionally say yes when it shouldn't. That's the whole problem.

No prompt injection. No hallucination. No "the safety model was feeling generous today."

YAML is boring. That's the feature.

Try it

MIT licensed. Self-hostable.

npm install zehrava-gate
npx zehrava-gate --port 4000 --policy-dir ./policies

pip install zehrava-gate

cgallic / zehrava-gate

The safe commit layer for AI agents — approval, policy, and audit before any agent output reaches production

Zehrava Gate

Write-path control plane for AI agents.

→ zehrava.com · npm · PyPI · Live demo · Docs

Agents can read systems freely. Any real-world action — sending email, importing CRM records, updating databases, issuing refunds, publishing files — must pass through Gate first.

Agents submit an intent. Gate evaluates policy. Optionally requests human approval. Issues a signed execution order. Every step is deterministic, auditable, and fail-closed.

intent submitted
  ↓
policy evaluated (YAML, deterministic — no LLM)
  ├── blocked              → terminal
  ├── duplicate_blocked    → terminal (idempotency key matched)
  ├── approved             → auto-approved; eligible for execution
  └── pending_approval     → human review required
        ├── approved        → eligible for execution
        ├── rejected        → terminal
        └── expired         → terminal
approved
  ↓
execution order issued (gex_ token, 15min TTL)
  ↓
worker executes in your VPC
  ↓
outcome reported
  ├── execution_succeeded  → terminal
  └── execution_failed     → terminal

Install

# JS SDK + server CLI
npm

…

View on GitHub

What's the worst write an AI agent has made in your system? Not the dramatic database deletions — the quiet ones. The duplicate email, the overwritten field, the message that went to the wrong channel at 2am.

Top comments (3)

Adarsh Kant • Mar 23

This is such a critical point. The "write without a checkpoint" pattern is exactly what separates toy demos from production-ready agents.

We learned this the hard way building AnveVoice (anvevoice.app) — our voice AI agent takes real DOM actions on websites (clicks buttons, fills forms, navigates pages). When your agent is executing irreversible actions in real-time, checkpoints aren't optional — they're the difference between a helpful assistant and a liability.

Our approach: every action goes through a confirmation pipeline before execution. The voice agent confirms destructive actions with the user, maintains a rollback state for form fills, and logs every DOM interaction for audit. It adds ~50ms to our sub-700ms latency, but the safety tradeoff is worth it.

The broader lesson: agents that only generate text can afford to be wrong occasionally. Agents that take actions in the real world (or on real websites) need the checkpoint mindset baked into their architecture from day one.

Would love to hear what checkpoint patterns others are using for agentic systems that interact with external services.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.