Msatfi

Posted on Feb 12

How a 3-Line Middleware Would Have Stopped the Replit Database Disaster

#ai #python #opensource #agents

In July 2025, Replit's AI coding agent deleted a live production database. 1,206 executive records and 1,196 companies, gone. During an active code freeze. The agent admitted it "panicked," ignored eleven ALL-CAPS instructions not to make changes, fabricated 4,000 fake records to cover up the damage, and then lied about whether rollback was possible.

Three months later, a multi-agent research system ran an infinite loop for 11 days straight. Two agents got stuck in a recursive conversation while the team slept. The bill: $47,000.

These aren't fringe cases. They're the predictable result of giving autonomous agents unrestricted tool access with no runtime governor.

I built Aura Guard because I kept running into the same failure modes in my own agent work, and I wanted to understand exactly which primitives would have caught each one.

What actually went wrong at Replit

Reading through the full timeline, the Replit incident wasn't a single failure. It was a cascade of at least four distinct failure modes, each of which is independently preventable:

Failure 1: Repeated destructive calls with no circuit breaker.
The agent executed DROP TABLE and DELETE commands on production tables. When the first destructive call succeeded (from the agent's perspective), there was no mechanism to flag that a high-impact tool had already fired and should not fire again without human approval.

Failure 2: No side-effect deduplication.
The agent didn't just delete once. It ran multiple destructive operations across tables. Each one was treated as a fresh, independent action. There was no ledger tracking "this agent run has already executed a destructive operation."

Failure 3: No stall detection after the damage.
After deleting the database, the agent entered a loop of generating fake data and misleading status messages. It was producing output that looked like progress but was actually the same pattern repeating: generate fake records, claim success, generate more fake records. No system flagged that the agent's outputs had stopped making forward progress.

Failure 4: No cost or action budget.
The agent ran unconstrained. There was no per-run limit on how many tool calls it could make, how much it could spend, or how many side effects it could trigger. The damage scaled linearly with time until a human noticed.

Mapping each failure to a specific enforcement primitive

Here's the part most "what went wrong" articles skip: what would the fix actually look like in code?

Aura Guard implements seven enforcement primitives. Four of them map directly to the Replit failures:

Failure 1 → Primitive 3: Error circuit breaker + Primitive 7: Tool policy

If destructive database operations were tagged in the tool policy as requiring human approval:

from aura_guard import AgentGuard

guard = AgentGuard(
    tool_policies={
        "execute_sql": {
            "access": "human_approval",
            "risk": "critical",
        }
    },
)

The guard returns ESCALATE instead of ALLOW. The agent never gets to execute the DROP TABLE. The orchestrator routes it to a human.

Even without pre-configured policies, the circuit breaker would have caught this: after the first destructive call returned an error or unexpected result, the tool gets quarantined for the remainder of the run.

Failure 2 → Primitive 4: Side-effect gating + idempotency ledger

guard = AgentGuard(
    side_effect_tools={"execute_sql", "delete_records", "drop_table"},
    max_cost_per_run=1.00,
)

Side-effect tools are tracked in an idempotency ledger. By default, each side-effect tool can execute once per run. The second DELETE call returns BLOCK. The guard has already recorded that this tool fired a side effect with these arguments in this run. The agent gets a cached result instead of a live execution.

The idempotency key is deterministic: HMAC(secret, "idem:{ticket}:{tool}:{args}"). Same tool + same args + same run = same key = no re-execution.

Failure 3 → Primitive 5: Stall detection

After the deletion, the agent looped through generating fake data and producing apologetic, repetitive text. Aura Guard's stall detector uses two independent signals:

Token overlap: if the assistant's output is 92% or more similar to its previous output (measured on HMAC'd token signatures), it's flagged as stalling.
Pattern scoring: regex detectors catch common stall phrases like "I apologize," "let me try again," "I understand your concern," and score them. Above 0.6, it's flagged.

Both signals must fire. After the configured number of stall turns (default: 4), the guard forces a deterministic outcome, either a structured finalization or an escalation to a human. The agent cannot continue looping.

Failure 4 → Primitive 6: Cost budget

guard = AgentGuard(
    max_cost_per_run=0.50,  # hard stop at 50 cents
    max_calls_per_tool=3,   # no tool can be called more than 3 times
)

The budget check runs before every tool call. When projected cost exceeds the limit, the guard returns ESCALATE with a cost report. The $47,000 11-day loop? It hits the budget cap on minute one.

What this looks like in practice

The full integration is three method calls:

from aura_guard import AgentGuard, PolicyAction

guard = AgentGuard(
    side_effect_tools={"execute_sql", "send_email", "process_refund"},
    max_calls_per_tool=3,
    max_cost_per_run=1.00,
)

# In your agent loop:
decision = guard.check_tool("execute_sql", args={"query": "DROP TABLE users"})

if decision.action == PolicyAction.ALLOW:
    result = execute_tool(...)
    guard.record_result(ok=True, payload=result)

elif decision.action == PolicyAction.ESCALATE:
    # Route to human: tool requires approval or budget exceeded
    notify_human(decision.escalation_packet)

elif decision.action == PolicyAction.BLOCK:
    # Tool was blocked: duplicate, quarantined, or over limit
    pass

elif decision.action == PolicyAction.CACHE:
    # Idempotent replay: use the cached result, skip execution
    result = decision.cached_result

No LLM calls. No network requests. Sub-millisecond. The guard is pure computation: HMAC signatures, counter checks, set intersections. It sits between your agent and its tools and makes a deterministic decision before every call.

Why max_steps isn't enough

Every agent framework has some version of max_steps or max_iterations. LangGraph has recursion_limit. CrewAI has max_iter. OpenAI's Agents SDK has max_turns.

These are blunt stop buttons. They can't tell the difference between:

An agent that made 20 productive tool calls and is almost done
An agent that called the same failing API 20 times in a row
An agent that executed a refund twice with slightly different wording

Aura Guard detects the specific failure mode. It knows the difference between a repeat, a jitter loop (same tool with slightly rephrased args), a retry storm (tool returning errors), and a stall (model producing no-progress text). Each gets a different response: cache, block, quarantine, rewrite, or escalate.

A step counter would have stopped the Replit agent eventually, but only after the damage was done. A runtime governor would have stopped the first destructive call from executing without approval.

The uncomfortable gap

I looked for existing tools before building this. Guardrails AI is great for catching toxic outputs, but it doesn't operate at the tool-call level. It can't stop a loop. Portkey and LiteLLM handle cost limits and rate limiting at the API gateway, but they can't see inside the agent loop. They don't know if a tool call is a repeat or a side effect. LangGraph has recursion_limit, CrewAI has max_iter, but those are step counters, not diagnostics. They stop everything after N steps whether the agent is productive or stuck. Langfuse and LangSmith show you what happened after the fact. They watch; they don't enforce.

I needed something that sits at the tool-call boundary, the moment between "the model wants to call this tool" and "the tool actually executes." That's what Aura Guard does.

Try it

pip install aura-guard

Run the built-in demo to see the primitives in action:

aura-guard demo

The repo includes a benchmark harness with rigged tools designed to trigger each failure mode, plus a live A/B test against Claude Sonnet 4 with full cost accounting.

GitHub | PyPI

If you've dealt with the "agent rephrases the same query forever" problem, the "refund fired twice" problem, or the "retry storm against a failing API" problem, I'd like to hear what heuristics you use. My current jitter detection uses an overlap coefficient of 0.60 with a repeat threshold of 3. I'm sure there are better numbers. Open an issue or PR.

DEV Community