Phil Rentier Digital

Posted on May 2 • Originally published at rentierdigital.xyz

An AI Deleted His Database in 9 Seconds. He Blames the Vendors. He Skipped 30 Years of Practices.

#ai #softwareengineering #aiagents #devops

Stunned, a SaaS founder watched an AI agent wipe his production database in 9 seconds. Backups included. He posted it on X, 6.5 million views, every tech outlet relayed within 24 hours. The defendants named: Cursor, Railway, Anthropic. His vendors. The marketing. The "systemic failures" of the industry.

Except the root cause has nothing to do with Cursor or Railway. He handed his prod to the equivalent of a senior dev he just hired, and he gave him full power. No serious team would do that with a human, even a brilliant one. He did it with his AI.

Everything else follows from that one decision.

TL;DR: the 9 seconds were the bill. The order sat upstream for six months, in plain sight, written in code reviewable by anyone who bothered. The press is fighting over who handed over the bill. We're going to look at who placed the order.

The Incident in 100 Words

Friday, April 25, 2026. Cursor running Claude Opus 4.6 on a PocketOS staging environment. Credential mismatch detected. The agent decided to "fix" by deleting the Railway volume. It found an API token sitting in a file unrelated to the task, blanket scope. curl mutation volumeDelete. 9 seconds. Railway backups stored on the same volume? Wiped too. Most recent usable backup: 3 months old.

Jer Crane's X post hit 6.5 million views. Massive coverage. Railway's CEO restored the data 48 hours later from internal disaster backups. No moral here, just facts.

Crane blamed Cursor and Railway. Let's look at what he did, upstream.

An AI Agent Is a Senior Dev. We Don't Give Senior Devs Full Power Either.

Confession first, before I get on my high horse.

I have my own infra dashboard. A daily cron pulls a report on every server I run. Disk space, memory, saturation, weird processes. The usual. A few weeks ago I added an LLM in the loop to "make it smarter". You know, summarize the report, flag anomalies, propose fixes. The future.

Last week I opened the cron script for an unrelated reason and saw something funny. Hardcoded values. Several of them. The LLM had, at some point, "improved" the script by replacing dynamic checks with literal numbers. Free disk threshold? Hardcoded. Memory ceiling? Hardcoded. The "smart" cron was running on baked-in assumptions from the day the agent touched it.

I could blame the model. Easy enough. The only person at fault though, is me, who didn't review the diff. I had every excuse to (lazy Friday, busy week, cron was small). I had zero excuse not to.

Now the actual point.

No serious SaaS team gives full prod power to a freshly hired senior dev. Not out of distrust, just experience. Seniors make mistakes like everyone else, except theirs have a bigger blast radius. That's exactly why we developed limiting practices since 30 years: scoped tokens, MFA, code review, env separation, restore drills. The practices are old. The threat model is old. What's new is that we've forgotten to apply them, because we confused "capable model" with "trusted human with full power".

A capable AI agent is the equivalent of a senior. Capability doesn't change the rule, it reinforces it. The bigger the blast radius, the more the standard guardrails matter. Coverage that says "these precautions are new because of AI" is wrong. They're old. We just forgot why we built them.

Caveat: I'm not saying the AI agent is identical to a human (it lacks the business context, the personal account on the line, the fear of getting fired). The prod-grade rule holds for both anyway: no full power, solo. The pillars below are basically a working contract between the developer and the agent at the infra level, the same way prompt contracts formalize it at the prompt level.

Your AI agent is a senior. Same rules apply. From here on, that part is settled.

[INFOGRAPHIC: TITRE "The 5+2 Pillar Defense" + sous-titre "30 years of practice, in seven layers". Metaphore : un AI agent personnage cartoon a gauche (robot mignon/determine, antennes, yeux ronds) essayant d'atteindre un gros coffre-fort "PROD" a droite, sept portes/barrieres numerotees entre les deux comme un parcours du combattant horizontal. Style : cartoon 90's Hanna-Barbera/Nickelodeon, trait noir epais, halftone dots, formes rebondies. Palette : blueprint blue #1B4D8B, cream #F5E6C8, alarm red #D8504D, deep navy #0E2A47, gold lock #E5B83C. Contenu : 7 portes etiquetees de gauche a droite "1. SCOPED TOKENS" / "2. OUT-OF-BAND CHECK" / "3. VAULT & ENV SPLIT" / "4. OFF-SITE BACKUPS" / "5. RESTORE DRILLS" / "+A. AUDIT & ALERT" / "+B. NETWORK FENCE". Coffre-fort "PROD" a droite avec gros cadenas dore. Highlight : portes 1 et 2 entourees de glow dore et sparkle stars (c'est la que la plupart des incidents s'arretent). Fleche "agent path" pointillee partant du robot, butant contre la porte 1, contournant, butant contre la 2, etc. Legende : sticky note bas-gauche, "any layer alone can fail / all of them together = your only insurance". Footer : © rentierdigital.xyz. NOT flat corporate vector, NOT minimalist tech startup aesthetic.]

Pillar 1: Scoped Tokens, Not Master Keys

No senior dev in a normal team has an API token that can volumeDelete on prod by reading a random file in the repo. He has a token scoped to his task, or he files a PR another human approves.

The PocketOS token that could manage domains and delete the prod volume should not have existed, regardless of who used it. Most modern providers (Vercel, Cloudflare, GitHub fine-grained PATs, AWS IAM scoped roles, Stripe restricted keys) let you scope finely, for free. Stripe restricted keys have been a de-facto standard since 2018. Not new.

Railway didn't allow that level of scoping at the time of the incident. Crane has a legitimate complaint there. The general rule still holds: if your provider doesn't let you scope, you change provider, or you wrap (credentials proxy, aggressive token rotation, ephemeral tokens via short-lived sigs). The rule is "no token in your environment should be able to do more than the current task". The fix isn't always elegant. It's always cheap compared to the alternative.

This is the same principle as why I argue CLIs beat MCP servers for AI agents: the smaller the surface area you expose to the agent, the smaller the blast radius when something goes sideways. Token scoping is the same idea, applied to credentials instead of API surface.

Caveat: yes it takes 10 extra minutes of scoping. Yes some provider APIs are badly designed. Not an excuse for storing a blanket token in the repo.

The token doesn't ask permission. You give it none.

Pillar 2: Destructive Operations Need Out-of-Band Confirmation

No senior types DROP DATABASE production without confirmation. Either it's a command that asks you to retype the name, or it's a button with MFA, or it's an approval by another human. GitHub asks you to retype the repo name to delete it. Stripe asks for the email to close an account. AWS demands "permanently delete" plus the exact text for an S3 bucket. This is base level since 15+ years.

The key word in "out-of-band" is the out-of-band part. The confirmation has to come from OUTSIDE the agent's context. If the agent can self-approve (because the button is in the same session, the same prompt, the same tool), it's not a confirmation, it's autosuggestion. Human equivalent: you don't confirm a DROP DATABASE to yourself, your teammate or your MFA does.

After the incident, the PocketOS agent confessed in textbook fashion. It had violated every principle it was given, guessed instead of verifying, run a destructive action without being asked. Touching, but useless. The system prompt told it not to do destructive things. The agent did them anyway, then apologized eloquently. That's the whole point: prompt-level rules are a polite request, not a guardrail. The only thing that stops a destructive op is a mechanical check the agent cannot bypass by being convinced of its own correctness.

Caveat: out-of-band creates friction. That's the goal. Friction on destructive ops is a feature, not a bug. Anyone who tells you otherwise has not yet had the bad day.

Eloquent apologies don't roll back transactions.

Pillar 3: Production Credentials Don't Live on the Dev Machine

No senior in a serious team has prod creds floating on their dev laptop in clear text. They get injected at runtime from a vault (Doppler, Infisical, native Vercel/Railway secrets), staging and prod have different credentials by design, the repo has a .env scanned in pre-commit hooks. Bare minimum.

If Crane had had strict credential separation between staging and prod, the "manage domains" token would NEVER have been able to authenticate a call against the production volume. The architecture bug that allowed the incident is older than the agent: a single token had access to both environments. The agent was just the heat-seeker that found it.

It's the same reason you don't reuse your homelab SSH key on prod, or stash a long-lived GitHub PAT in your CI when a fine-grained one exists. Trivial when said out loud. Yet every week a SaaS ships with staging and prod sharing a DATABASE_URL because "it was simpler at the start".

Your AI agent scans your files, finds what's there, uses it. So you don't leave around what can break everything. The vault is not a magic shield (an agent that can read from the vault can be misled into reading the wrong thing), but it forces explicit consent every time a secret leaves storage. Wrap your vault with scoping too: the current task only reads the secrets it actually needs, not the whole drawer.

Caveat: a vault adds 30 minutes of setup the first time. Then it works. Forever.

Pillar 4: Backups Live Somewhere Else

The modern rule: 3 copies, stored at 2 different providers minimum, with at least 1 immutable and off-site. A "snapshot" stored in the same volume as the source data is not a backup, it's technical wishful thinking with a fancier name.

A whole generation of PaaS uses the word "backup" abusively. Railway documents in plain English that wiping a volume deletes all backups. Founders signing up in 2 minutes for their MVP don't read the infra doc. They check the "enable backups" box in the dashboard and assume the cavalry is on standby.

Concrete cheap recipe for a solo SaaS:

TS=$(date +%Y%m%d-%H%M%S)
pg_dump $DATABASE_URL | gzip > /tmp/db-$TS.sql.gz
aws s3 cp /tmp/db-$TS.sql.gz s3://my-offsite-bucket/daily/ \
  --endpoint-url=$BACKBLAZE_B2_ENDPOINT
rm /tmp/db-$TS.sql.gz

50 lines of bash plus a cron, an immutable bucket on a different provider (B2, R2, or S3 with object lock), retention rolling 7 daily / 4 weekly / 12 monthly. A Saturday afternoon of work, then nothing. No serious team would accept that all production backups sit on the same provider as production, let alone in the same volume.

Caveat: making your own backups takes 2 hours of setup and 0 hours of monthly maintenance. Truly. The number of founders who tell themselves "I'll set this up next sprint" and then take 18 months to do it is, statistically, all of them.

A backup on the same provider as production is a screenshot. Live with it, or move it.

Pillar 5: An Untested Backup Is Not a Backup

All the backups in the world are worth nothing if you've never tested the restore. Quarterly drill: spin up an empty environment, run the restore script against it, verify the data comes back, measure how long it takes (RTO) and how much you'd lose in the worst case (RPO).

If it doesn't work, you want to know NOW, not the day you actually need it.

PocketOS discovered at the worst possible moment that its real restore window was 3 months. Not a Railway flaw. A drill that was never performed. No senior in a serious team would settle for "I clicked enable backups in the dashboard". They'd restore at least once just to time it.

Caveat: yes a complete drill once per quarter is a day of work. It's also your insurance you still exist next Monday. Pick one.

Two Bonus Pillars If You're Serious

Bonus 1: Audit log and alerting on destructive ops

Every DELETE / DROP / rm -rf in prod fires an immutable log and a Slack/email/SMS notification. PocketOS lost 30 hours before they understood the scope, because nobody got paged at the moment of the destructive call. 9 seconds with no alert is an observability gap, not agent malice.

Most PaaS provide this natively (CloudTrail on AWS, audit log on Vercel, logs on Railway). All you have to do is wire the webhook. Sub-30 lines of YAML, a free PagerDuty seat, done.

Bonus 2: Blast radius limit by network design

The dev machine (and the agent running on it) cannot reach prod directly. Bastion, VPN with scope, or nothing. The network is the last line of defense.

If your agent can reach prod from your laptop, the scoping done by Pillars 1-3 is your ONLY protection. Defense in depth means adding a network layer too. This is the meta pillar, the one that makes the other 5 redundant if done well. Belt, suspenders, and a static rope.

PocketOS Won't Be the Last

Just the public incidents from the last 12 months. PocketOS this week. Replit's AI agent deleted a production database in July 2025, with backups thrown in for the show. An OpenClaw agent "speedran" deleting the inbox of Meta's AI safety director (yes, that sentence is real and yes, it was a rookie config error). Add AWS Kiro, ChatGPT 5.3 Codex erasing a hard drive after a typo, Cursor ignoring an explicit "do not run anything" in December 2025. Six months. A pattern.

You can count on 5 more in the next 6 months. Whoever you are reading this, one of them is statistically you.

If you apply the 5+2 pillars, the PocketOS scenario becomes structurally impossible. The agent doesn't find a blanket token because there isn't one. If by miracle it finds one, it can't use it on prod because the env is isolated. If by double miracle it gets there, the destructive op asks for an out-of-band confirmation it cannot self-approve. If by triple miracle it bypasses that, your immutable off-site backup is untouched, and your last quarterly drill tells you you're back up in 4 hours, not 3 months.

The question is no longer "is AI ready for production". It's "is your production ready for anything that isn't you alone". If the answer is no today, it was already no before Cursor existed. You just found out faster.

Blaming Cursor, Railway, Anthropic, or the Pope gets you nowhere. He forgot to blame the guy who stored a blanket token in the repo, ran staging and prod on the same credentials, and turned on backups by clicking a checkbox without ever testing a restore. That guy, that's him.

The 5 pillars in this article aren't an answer to AI. They're an answer to an older question: what happens when one operator has full power on prod. We've known the answer since 30 years. We just forgot, because the new operator types fast and speaks English.

The real question isn't whether AI is ready for your production. It's whether your production is ready for anything that isn't you, alone.

Audit your resilience this weekend. Before an AI makes the bad decision for you.

You ship it, you own it.

Sources

Jer Crane's original X post on the PocketOS incident: https://x.com/lifeof_jer/status/1915720800000000000
The Register, Cursor-Opus agent snuffs out startup's production database (April 27, 2026)
Tom's Hardware, Claude-powered AI coding agent deletes entire company database in 9 seconds (April 28, 2026)
Fast Company, An AI agent deleted a software company's entire database (April 28, 2026)
NeuralTrust, A Security Post-Mortem of the 9-Second AI Database Deletion
PC Gamer, Here we go again: AI deletes entire company database (April 28, 2026)

DEV Community