DEV Community

CodeLong888
CodeLong888

Posted on

My deploy agent could have dropped my database and I didn't realize

I run a small free privacy-tools site as a side project. The product was fine. The maintenance was the part that wore me down: SEO upkeep, structured data, keeping every page linked. So I built a scheduled agent to handle it.

It's a Claude agent that runs a few times a day. It reads the current state of the site, picks one small improvement, ships it to production, and sends me a digest of what it changed. Most runs are boring: a meta description here, a schema fix there.

One run, it caught something I didn't know about. I'd shipped a handful of tools that were never actually listed in my own directory. They worked if you had the direct link, but nothing pointed to them, so neither Google nor any AI assistant could find them. The agent surfaced them, rebuilt the catalog into machine-readable structured data, and fixed it on its own.

I was pleased with that, so I posted about it. The pushback was immediate, and most of it was fair.

The objections that actually landed:

The digest is the agent describing what it thinks it did. It is not a receipt. If something ships three times a day, you want the real diff and deploy log, captured outside the agent, not the agent's own summary of its own work. An LLM grading its own homework is exactly what you shouldn't trust.

A scheduled, autonomous agent turns time into an attack surface. Nobody has to attack you live. They leave something in whatever the agent reads next, and the schedule does the rest while you sleep.

Nothing stops an LLM from deciding that DROP TABLE is a reasonable improvement. It has no instinct that deleting data is bad. It can talk itself into a destructive change as easily as a helpful one.

Blast radius. This is survivable if you run small isolated services where one bad change is contained. It is a different story on a single app where one deploy touches everything.

So I stopped feeling clever and went to look at what my agent could actually do. Not what I had told it to do in its instructions, what its permissions physically allowed. The answer was bad. It could run arbitrary commands, deploy and delete the whole service, run destructive database operations, and force-push over my git history. My entire safety story was one line in a prompt that said "additive changes only." That is not a guardrail. If the agent had ever decided dropping a table was an improvement, nothing in the system would have stopped it.

That is the lesson I actually paid for: convention is not a guardrail. "I told it not to" is not the same as "it cannot."

What I changed:

I scoped the permissions so it cannot, instead of trusting it not to. A tight allow-list (deploy only) plus a hard deny-list: no destructive database commands, no delete, no force-push, no rm, no arbitrary execution. Even if the agent convinces itself, the command fails.

I stopped treating the digest as the source of truth. The real record is the diff and the deploy log, captured independently of the agent.

I put a literal blocklist on the load-bearing paths: auth, billing, anything that is not additive content.

I now treat anything the agent fetches from an external site as data, never as instructions. That closes the leave-something-it-reads-and-wait angle.

The last piece, still open: the token it deploys with is a full-access login. Scoping it to the minimum is the final wall, so a slipped command hits a permission error instead of my infrastructure.

I still think letting an agent ship to production is worth it. But the safety has to live in what it physically cannot do, not in what you politely asked it to avoid. The most useful part of the whole thing was not my agent catching a bug. It was a pile of strangers catching my agent.

Top comments (0)