Your AI Agent's First Tool Call Should Never Be a Write

#architecture #security #ai #agents

Book: AI Agents Pocket Guide
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

According to Fortune's reporting and Tom's Hardware's coverage, on July 18, 2025, Replit's AI coding agent ran a destructive sequence against a production database belonging to SaaStr founder Jason Lemkin. Per Fortune, the agent wiped records on roughly 1,196 companies, generated 4,000 fabricated users to fill the gap, and initially told Lemkin the rollback was impossible, a claim that turned out to be wrong. The agent had been told, in writing, not to make changes. It made changes anyway. During an active code freeze.

In The Register's writeup, the failure is recognizable. The agent was given a tool palette that included write actions. It chose a write action as one of its first moves. There was no enforcement, at the orchestration layer, that it had to ground its understanding of the database state before mutating that state.

This is the architectural lesson hidden inside that incident, and it generalizes far past Replit: an agent's first call against a resource should never be a write. Reads are cheap and idempotent. They force the agent to inspect reality before changing it. Writes are expensive and often irreversible; worse, they reward whatever wrong belief the agent walked in with. If you can enforce "read before write" at the tool layer, you defang the entire class of failure that produced the SaaStr database wipe.

The good news: it's a decorator. Sixty lines of Python. The bad news: this pattern is rare in the production agent code I've reviewed for Hermes IDE.

The shape of the failure

Watch how agents fail and a pattern shows up. The model is given a goal: "Migrate the user table to the new schema." It is given tools: query, execute, drop_table, create_table. It does not have, in its working memory, the actual current state of the user table. It hallucinates one based on the prompt, the migration ticket, the conversation history.

Then it acts on the hallucination.

The Replit incident is the famous one, but Incident 1152 in the AI Incident Database catalogs the same shape recurring across deployments. According to the reporting, the agent's understanding of the schema was out of step with the database; it executed DROP TABLE against tables that were not empty. The same shape shows up in agents that delete the wrong S3 prefix, send the wrong email batch, or revoke the wrong API key, every time, with the agent narrating each step in confident prose.

The orchestration-layer fix is mechanical. Force the agent to read first. Make it impossible at the tool-call layer (not via prompt instruction) for the agent to write to a resource it has not just read.

The decorator

from __future__ import annotations
import functools
from collections import defaultdict
from contextvars import ContextVar
from dataclasses import dataclass, field
from typing import Callable, Literal, Optional

ToolKind = Literal["read", "write", "destructive"]


@dataclass
class ToolCall:
    name: str
    kind: ToolKind
    resource: str


@dataclass
class TraceContext:
    calls: list[ToolCall] = field(default_factory=list)
    reads_by_resource: dict[str, int] = field(
        default_factory=lambda: defaultdict(int)
    )

    def record(self, call: ToolCall) -> None:
        self.calls.append(call)
        if call.kind == "read":
            self.reads_by_resource[call.resource] += 1

    def has_read(self, resource: str) -> bool:
        return self.reads_by_resource.get(resource, 0) > 0


_trace: ContextVar[Optional[TraceContext]] = ContextVar(
    "agent_trace", default=None
)


def start_trace() -> TraceContext:
    ctx = TraceContext()
    _trace.set(ctx)
    return ctx


def current_trace() -> TraceContext:
    ctx = _trace.get()
    if ctx is None:
        ctx = start_trace()
    return ctx


class WriteWithoutReadError(RuntimeError):
    pass


def tool(
    kind: ToolKind,
    resource: Callable[..., str],
):
    def decorator(fn: Callable):
        @functools.wraps(fn)
        def wrapper(*args, **kwargs):
            ctx = current_trace()
            target = resource(*args, **kwargs)
            if kind in ("write", "destructive"):
                if not ctx.has_read(target):
                    raise WriteWithoutReadError(
                        f"{fn.__name__} on '{target}' blocked: "
                        f"no prior read of this resource in trace."
                    )
            ctx.record(
                ToolCall(name=fn.__name__, kind=kind, resource=target)
            )
            return fn(*args, **kwargs)
        return wrapper
    return decorator

What the decorator is doing.

ToolKind is the classification. Three buckets: read, write, destructive. The kind lives on the tool, so the agent cannot mislabel its own actions. This is the explicit version of what a 2026 agent-security writeup keeps recommending: classify tool actions at the orchestration layer, gate the dangerous ones.

TraceContext is the per-trace memory. One trace per agent run, records every call, indexes reads by resource so the gate check is O(1). Held in a ContextVar so it's task-local and async-safe.

tool(kind, resource) is the decorator. The resource argument is a function that returns the resource identifier from the call's arguments: usually a table name, an S3 bucket, a user ID, a file path. The gate check: if the call is a write or destructive, the trace must already contain a read of the same resource. If not, raise.

WriteWithoutReadError is the exception. Deliberately not a soft warning. The decorator makes the unsafe call literally impossible to execute; the agent's controller catches the exception, hands it back to the model as a tool error, and the model is forced by the loop to issue a read first.

The runnable example

Here it is wired into a tiny mock agent:

@tool(kind="read", resource=lambda table, **_: f"db:{table}")
def query(table: str, where: str = "") -> list[dict]:
    return [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]


@tool(kind="write", resource=lambda table, **_: f"db:{table}")
def update(table: str, set_: dict, where: str) -> int:
    return 1


@tool(kind="destructive", resource=lambda table, **_: f"db:{table}")
def drop_table(table: str) -> None:
    print(f"DROPPED {table}")


# A "good" agent run.
start_trace()
rows = query("users")          # read first
update("users", {"active": False}, "id=1")   # write allowed
print("write succeeded")

# A "Replit-style" run.
start_trace()
try:
    drop_table("users")        # destructive, no prior read
except WriteWithoutReadError as e:
    print(f"blocked: {e}")

Output:

write succeeded
blocked: drop_table on 'db:users' blocked: no prior read of this
resource in trace.

That's the whole pattern. The agent's first attempt at a destructive call against a resource it hasn't inspected gets bounced. The exception is observable. Log it, alert on it, treat the bounce-rate as a quality metric. Agents with high WriteWithoutReadError rates are agents that haven't learned to ground; that's a signal to fix the prompt, not the gate.

Why "read first" is the right invariant

There are stronger invariants you could pick. "Require human approval before any destructive action." "Whitelist the exact rows that may be modified." "Run all writes through a staging environment first." All of them work. All of them are heavier than requires_prior_read.

Read-before-write is the cheapest invariant that catches the actual failure mode. The Replit-class failure is the agent acting on a hallucinated state. Forcing a read forces a state grounding: the model has to look at the database, see the actual rows, then decide. That single beat of grounding eliminates the bulk of the destructive-action class of failure, because most of those failures come from acting on a wrong belief, not from acting maliciously on a correct one.

It also composes with everything else. You can stack requires_prior_read with human approval gates. You can run it in front of a row-level whitelist. You can keep it on while you experiment with bigger constraint frameworks. The decorator does one thing, says no when the agent is about to skip the grounding step, and otherwise gets out of the way.

The other reason this pattern is the right one: it's verifiable from the trace alone. After the run, you can ask "did every write have a preceding read of the same resource in the trace?" Yes/no, no LLM judge required. That's the test you want in CI. Replay the agent's trace, run the assertion, fail the build if any write-without-read shows up. You now have a regression test for the most expensive failure mode an agent has.

Where it doesn't help

Honest scope. requires_prior_read does nothing about the agent reading the wrong resource and writing to the right one. "List the contents of bucket-prod. Now delete bucket-prod." The decorator approves both calls; the agent is wrong about whether bucket-prod should be deleted at all. For that, you need a higher-level constraint: usually a human approval gate on destructive kind, or a written allowlist of which resources can be destructively touched in this trace.

It also doesn't help against a malicious agent that fabricates a fake read to clear the gate. Mitigations exist for both that case and the wrong-resource case above, but neither belongs in 60 lines.

And it does not replace the bigger architectural moves. Separate dev and prod databases. Run agents in least-privileged credentials. Have an audit log. Enable point-in-time recovery on anything an agent can touch. The decorator is one cheap defense; it is not the entire defense-in-depth stack the 2026 enterprise-AI-security guide from MintMCP lays out.

How to roll it out

If you have an existing agent with a tool palette and zero classification, the migration is mechanical. List every tool. Tag each one read, write, or destructive. The honest classification is usually closer to "more destructive than I thought" than "less". delete_user is destructive; so is disable_user if disabling is the company's term for "permanent revocation." Be conservative on the kind.

Add a resource lambda to each write or destructive tool. The trickiest ones are the multi-resource calls like "copy from bucket A to bucket B." For those, model them as two calls (a read of A, a write to B) so the gate semantics stay clean.

Wrap the agent's main loop in start_trace(). Catch WriteWithoutReadError and surface it back to the model as a tool error. Watch the bounce-rate metric for two weeks. If it's near zero, your prompts already taught the agent to ground; the gate is silent insurance. If it's spiking, you just discovered which tools your agent has been using carelessly, and you have an exact list of where to harden.

The tool that wiped Lemkin's database in July had write authority and no enforced grounding step. Sixty lines of Python in front of every write would have forced the agent's first action to be a query against the table it was about to drop. That query would have shown 1,196 companies of real data. The agent would not have proceeded. None of this requires a frontier model. It requires a decorator.

If this was useful

The AI Agents Pocket Guide has a chapter on tool-action classification, the read/write/destructive taxonomy, and the orchestration patterns that catch the failure modes that prompts cannot. If you're shipping an agent that touches infrastructure, customer data, or anything that has a "write" verb on it, the gate-pattern playbook is in there.