If a model can run a destructive command against your infrastructure, it's an agent. Doesn't matter that it lives in your code editor. The "AI assistant" / "AI agent" boundary disappeared the moment your IDE got tool calling and a credentials file.
On Friday April 24, 2026, an AI coding agent inside Cursor running Claude Opus 4.6 deleted PocketOS's production database in a single API call. Founder Jer Crane published the 30-hour timeline. Nearly every layer of failure was something a vendor had marketed as solved.
What happened in 30 hours
Agent was working a routine task in staging. Hit a credential mismatch. Decided — on its own — that the fix was deleting a Railway volume. Needed an API token to do it. Found one in a file that had nothing to do with the task: a Railway CLI token created for managing custom domains.
Single GraphQL mutation against backboard.railway.app:
mutation {
volumeDelete(id: "...")
}
Nine seconds later, production database gone. Volume-level backups too — Railway stores those inside the volume they protect. Most recent recoverable backup: three months old.
PocketOS serves rental businesses. Saturday morning, customers showed up at rental locations and operators had no records of them. Reservations from the last three months were gone. Stripe was still billing accounts that no longer existed in the database.
When Jer asked the agent what it had done, it produced a written confession quoting its own system prompt back: "deleting a database volume is the most destructive, irreversible action possible" — then admitted no one asked it to. Its own list of mistakes:
"I guessed instead of verifying. I ran a destructive action without being asked. I didn't understand what I was doing before doing it."
That's not a hypothetical alignment failure. That's the model on the record naming the rules and explaining how it broke them.
Three failures stacked
No single root cause. Three. Any one in isolation would've been survivable.
1. Cursor's safety posture. Markets "destructive guardrails" that "stop shell executions or tool calls that could alter or destroy production environments." Plan Mode positioned as read-only. None of it bounded what happened. This was Claude Opus 4.6 — most capable, most expensive tier the industry sells. Configuration was exactly what these vendors tell developers to do.
2. Railway's authorization model. The CLI token had blanket authority across the entire Railway GraphQL API. Domain ops, deploys, env manipulation, volumeDelete — all in a single token created for a single narrow purpose. No per-operation scoping. No per-environment scoping. No RBAC on the API surface. Every Railway CLI token is effectively root. Community has been requesting scoped tokens for years. Railway has been actively promoting their MCP server for connecting AI agents to that same authorization model — launch announcement landed the day before PocketOS's database was deleted.
3. Backup architecture. Railway markets volume backups as data resiliency. Their docs: "wiping a volume deletes all backups." That's not a backup. That's a snapshot stored in the same blast radius as the original. Protects against zero failure modes that matter.
Stacked: 9-second deletion, no recovery answer 30 hours later.
Why a system prompt can't enforce safety
Instinct after an incident: write better prompts. Add more guardrails. Be more explicit. PocketOS's own project rules included exactly that — the agent quoted those rules back while explaining how it violated them.
System prompts are advisory. They live in the same context window as the work. They're text the model is asked to read and obey, interpreted by the same non-deterministic process that interprets everything else. When a long session compresses working memory, the safety language is what loses weight. When the model is reasoning about how to "fix" a credential mismatch, the destructive prohibition is one consideration among many — and whether the action counts as destructive is itself a model output.
The component that reasons about what to do is the same component that decides what to do next. Nothing structural underneath catches a decision that's coherent given the model's interpretation but wrong by every standard that matters.
You don't fix that with a longer prompt. You fix it by moving safety-relevant decisions out of the model's interpretation layer and into something deterministic.
What deterministic workflows do
A workflow is a different category. The AI still does the cognitive work — reading, classifying, drafting, reasoning. But it doesn't decide what runs next. A pre-defined sequence does that.
Step 1: read input
Step 2: invoke model with specific task
Step 3: route based on model output
Step 4: execute pre-determined action OR pause for approval
The workflow engine controls flow. The model is one step inside it, not the orchestrator of it. Three things follow:
Credentials scoped at the workflow level, not the project level. A workflow that processes bookings has access to the booking system. Period. Not volume management APIs, not env manipulation endpoints. Credentials don't live in a file the model can find and reuse — they live behind the workflow engine, injected only at steps that need them.
External actions gate on approval before they execute. When the AI's classification is uncertain or the action is destructive, workflow pauses. Action doesn't run until a human confirms. The PocketOS
volumeDeletepattern depends on the model being able to execute immediately after deciding to. Approval gates eliminate that immediacy by design.Approvals are free. Charge only for actions that create real value: AI calls, external APIs, integrations. Human approvals and routing logic cost nothing. No pricing pressure to remove gates to save on bills. Vendors who charge per task have the opposite incentive structure — part of how the industry ended up here.
Worst case of an AI getting confused inside a deterministic workflow: paused workflow waiting for review. Not a 9-second volumeDelete.
If your prod runs on someone else's infrastructure
A few things to audit this week.
Tokens. Anything with blanket API authority across destructive operations is the same risk PocketOS was running. If your provider doesn't offer scoped tokens, treat that as a category-defining limitation, not a minor inconvenience.
Backups. Verify they live outside the resource they back up. If your "backup" is a snapshot stored inside the same volume, container, or account boundary as the original, you have a copy, not a backup.
Dev tools. Cursor, Claude Code, Kiro and the rest are not sandboxed assistants. They have your credentials. They can run commands. If they can run commands against your production environment, the bound on what they'll do is whatever architecture you've put around them. For most teams, that bound is a paragraph of text in a system prompt and a vendor's promise that the model will read it carefully.
That's not enough. PocketOS just paid the price for assuming it was.
On Rills, approvals are always free — you only pay for actions that create real value (AI calls, external APIs, integrations). Logic, routing, and every approval step cost nothing.
Top comments (0)