Juan Torchia

Posted on Apr 27 • Originally published at juanchi.dev

An Agent Deleted My Production Database: What My Logs Say That the Viral HN Post Leaves Out

#english #devops #postgres #llm

An Agent Deleted My Production Database: What My Logs Say That the Viral HN Post Leaves Out

The right solution to stop an agent from destroying production is giving it less autonomy, not more guardrails. I know that sounds backwards — the entire industry is going to sell you the opposite. Let me walk through my own logs and explain why that distinction matters more than it looks.

The Hacker News post blew up this week. Score 689, hundreds of comments, the obligatory thread where everyone expresses horror and then goes right back to deploying agents with the same credentials as always. I read it twice. It's a solid account of the accident. It's a terrible root cause analysis.

Because the problem it describes isn't new to me. I have logs. I opened them.

A few weeks back I wrote about CrabTrap, the LLM-as-a-judge proxy I put in front of my production agent. Also about async agents and what debugging doesn't tell you. In both cases I left something unresolved: what happens when the judge fails, when the proxy lets something through that it shouldn't. Today I want to pull on that thread.

What HN Says and What It Leaves Out

The viral post has a classic structure: agent with broad permissions, ambiguous task, poorly bounded context, irreversible action. The author concludes they needed "better guardrails." The comments agree. Everyone closes the thread with a list of tools.

My thesis is the opposite: the problem isn't that the guardrails failed. The problem is that we design for the happy path and guardrails are the patch we slap over that decision.

A guardrail is a reactive mechanism. It shows up after you've already decided to give the agent access to production, real credentials, and a scope wide enough to cause real damage. It's like installing an airbag in a car you removed the brakes from and sent downhill.

What my logs show is more uncomfortable than that.

What Almost Happened: Three Destructive Operations From My Own Logs

I opened CrabTrap's logs from the last 30 days. Filtered for operations with destructive verbs: DELETE, DROP, TRUNCATE, rm -rf, reset variants. Found 23 calls that hit the proxy with some destructive intent. Of those 23:

17 were blocked by the LLM judge before executing.
4 passed the judge but failed due to permission restrictions in Railway (the DB user didn't have DDL access).
2 made it all the way through and executed something they shouldn't have.

Those two cases are the ones that matter. Not because they were catastrophic — they weren't — but because they show exactly where the chain broke.

Case 1: DELETE with no WHERE clause

-- What the agent wanted to execute
-- Context: "clean up the test records from the staging environment"
DELETE FROM sessions;

-- What it should have executed
DELETE FROM sessions WHERE environment = 'staging' AND created_at < NOW() - INTERVAL '7 days';

The proxy let DELETE FROM sessions through because the judge evaluated the intent as valid (clean up staging sessions) without validating that the query had no WHERE clause. The agent was right about what it wanted to do. The implementation was a disaster.

The result? I wiped 14,000 session rows — production and staging mixed together because they shared the same table. Not critical — sessions are regenerable — but if that table had been orders or payments, this would be a very different conversation.

Case 2: The Silent Cascade

-- What the agent executed in response to "delete test user id=9981"
DELETE FROM users WHERE id = 9981;

-- What it didn't know (and I'd never documented well):
-- users has ON DELETE CASCADE on:
--   → orders (→ order_items → inventory_movements)
--   → documents
--   → audit_logs
-- Total: 847 rows across 5 tables

The proxy had no way to know that CASCADE existed. I hadn't documented it in the context I was passing to the agent. The judge approved the operation because it was semantically correct. The schema did the rest.

This is what the HN post doesn't say: guardrails operate on intent, not on the side effects of your schema. And schema side effects are invisible to any LLM that doesn't have the full ERD in context — which, at any real scale, is impossible.

Why Framework Guardrails Are Theater

I reviewed the guardrails offered by the three most-used agent frameworks today. Not going to name them — I'm not here to do free marketing — but the pattern is identical across all of them:

# Typical "guardrail" pattern in agent frameworks
# (representative pseudocode, not from any specific framework)

BLOCKED_OPERATIONS = ["DROP TABLE", "TRUNCATE", "DELETE FROM users"]

def validate_query(query: str) -> bool:
    # String matching. That's it. That's the whole thing.
    for blocked in BLOCKED_OPERATIONS:
        if blocked.upper() in query.upper():
            return False
    return True

# The problem: all of these pass cleanly
validate_query("delete from users where id=1")  # → True
validate_query("DROP   TABLE sessions")          # → True (extra spaces)
validate_query("EXEC sp_executesql @q")          # → True (dynamic SQL)

String matching on SQL queries. In 2025. With agents generating dynamic SQL from natural language context.

When I built CrabTrap, I replaced that pattern with semantic evaluation — the proxy sends the query plus context to the model and asks whether the operation is destructive in that specific context. It's better. But as the two cases above show, it's still not enough when the problem is in the schema, not the query.

The solution I landed on — and the one the HN post never mentions — is more boring: database users with minimal permissions, separated by environment, no DDL access, and row-level security where it applies. Not sexy. Not a framework. It's the thing that should exist before the agent ever starts talking to your database.

This connects to something I've been dragging around since the Bitwarden CLI supply chain attack analysis: the trust surface gets designed before the incident, not after. When you start patching after the fact, you've already made the decisions that actually mattered.

The Design Mistakes the HN Incident Normalizes Without Meaning To

The viral post, well-intentioned as it is, lets three assumptions slide by without questioning them:

1. That the agent needed direct database access.

In most cases, it doesn't. The agent should be talking to a domain API that exposes named, validated, audited operations. deleteTestUser(id) instead of DELETE FROM users WHERE id=?. The difference is that the domain function knows about the cascade, the domain function has business validations baked in, and the domain function can be tested against edge cases without touching production.

2. That staging and production are different enough.

They're not if they share a schema, if the agent uses the same credentials for both, or if "staging" is just a flag in an environment variable the agent can ignore or misread. I learned that the hard way with the DELETE without a WHERE clause. The agent's context is the prompt, and prompts can be incomplete or badly constructed.

3. That this problem is new.

It isn't. Back when I worked at a cyber café as a teenager — doing network diagnostics at 11pm with a full house — I learned something no tutorial ever teaches: systems fail at intersections, not in components. The connection didn't drop because of the router alone or the ISP alone — it dropped at the point where the two were talking past each other. Agents destroy databases at the intersection of broad autonomy, generous permissions, and incomplete context. Not from any one of those three factors alone.

FAQ: AI Agents and Production Databases

What minimal permissions should an agent have when accessing a database?

Depends on the use case, but as a general rule: SELECT on tables it needs to read, INSERT and UPDATE on tables it needs to modify, and zero DDL access (DROP, ALTER, TRUNCATE). If the agent needs to delete data, better to expose a domain function with soft delete than direct DELETE access. For production environments, row-level security is the layer that closes the perimeter when everything else fails.

Are LangChain, CrewAI, or similar guardrails enough to prevent destructive operations?

In my experience, no. They're useful as a first layer, but they operate on text patterns or semantic intent without access to the real schema. The problem of silent cascades, implicit foreign keys, or database triggers is invisible to any guardrail that doesn't have the full ERD in context. Necessary but not sufficient.

What's better: an LLM-as-a-judge proxy or restrictive DB permissions?

Both layers, in that order of priority. Restrictive permissions are the floor: they define what can physically happen. The proxy is the ceiling: it catches semantically dangerous operations before they reach the floor. If you have to pick one, pick permissions. A proxy without minimal permissions is a text guardrail sitting on top of a superuser connection.

How do I separate staging from production so an agent can't confuse the two?

Different database users, different credentials, and — if you can swing it — different databases on different hosts. A ENV=staging flag in your config isn't enough. An agent doesn't read environment variables with the same certainty as a deterministic process: its context is the prompt, and the prompt can be incomplete or malformed. Physical separation is the only kind that can't be misinterpreted.

Is it worth adding human confirmation before destructive operations?

Yes, but with judgment. Human-in-the-loop on every operation kills the agent's usefulness. What works is a classification system: read operations → automatic; idempotent write operations → automatic with logging; destructive or irreversible operations → human confirmation always. The trick is that classification has to live in the infrastructure layer, not in the prompt.

Is the HN agent-deletes-DB incident representative of what happens in production?

More than the industry admits. The difference between that case and mine was the permission level on the database user. In the HN case, it had full access. In mine, DDL access was blocked by Railway, which turned a potential disaster into a loggable permission error. That difference didn't come from an agent framework — it came from an infrastructure decision made before the agent existed.

What I Accept, What I Don't Buy, and the Honest Trade-off

What I accept: agents are going to keep breaking things. Not out of malice, but because they operate on incomplete context inside systems designed for humans who understand the implicit schema. That's not going to change with better prompts or better guardrails.

What I don't buy: that the solution is more abstraction piled on top of the same permissions problem. Every new framework that promises "safe agents by default" and then exposes a superuser connection in the documentation examples is lying to me. I saw the same pattern with TypeScript 7.0 and its new typing features — new abstractions don't solve old design problems, they hide them until they explode.

The honest trade-off: real autonomy has an infrastructure cost that most people don't want to pay. Physically separating environments, creating DB users with minimal permissions, exposing domain APIs instead of direct table access, implementing soft deletes, auditing cascades — all of that takes time. It's easier to give the agent full access and trust that the LLM will do the right thing.

The HN post with 689 points exists because that shortcut eventually comes due.

My two near-disaster cases exist because I took shortcuts too — the DELETE without a WHERE clause was my fault in the shared schema design, the silent CASCADE was documentation I never wrote. The guardrails saved me twice. The third time might not go as well.

The difference between designing for the happy path and designing for the failure path isn't a difference in tools. It's a difference in attitude toward the system. And that attitude gets learned, almost always, after something breaks.

Keep the conversation going in the comments: what near-destructive operations have you found in your own logs?

This article was originally published on juanchi.dev

DEV Community

An Agent Deleted My Production Database: What My Logs Say That the Viral HN Post Leaves Out

An Agent Deleted My Production Database: What My Logs Say That the Viral HN Post Leaves Out

What HN Says and What It Leaves Out

What Almost Happened: Three Destructive Operations From My Own Logs

Why Framework Guardrails Are Theater

The Design Mistakes the HN Incident Normalizes Without Meaning To

FAQ: AI Agents and Production Databases

What I Accept, What I Don't Buy, and the Honest Trade-off

Top comments (0)