DEV Community

Rafael Costa
Rafael Costa

Posted on

Five Corrections: What an AI Agent Didn't Know About My Production Database

And why "just write a better prompt" is the wrong lesson

The AI agent had just pulled 30 days of CloudWatch metrics, parsed them correctly, built a table, and pivoted the entire recommendation based on what it found.
I typed four words: "that's shared cluster data."
The deadlock counts, the connection peaks, the daily patterns — all real numbers from a real database cluster. All useless for the decision we were making. Our application shares that cluster with other services. The agent had processed the data flawlessly and drawn a conclusion from someone else's workload.
The data was flawless. The conclusion was someone else's. That gap has a structure worth looking at.

The setup

Multi-tenant Django application on Aurora MySQL. Low user count, background workers for bulk operations, primary-replica topology. No transaction isolation level configured anywhere — MySQL's default REPEATABLE READ running unchallenged. Sporadic deadlocks in background tasks, stale reads on the replica.
I pointed an AI agent at the codebase and asked it to devise a plan for choosing the right isolation level. Not "switch to READ COMMITTED" — decide what it should be, with evidence.

What the agent did well

Within minutes it had explored the entire codebase, mapped six categories of concurrent access patterns, and evaluated all four MySQL isolation levels against each. Found that no isolation level was configured anywhere — not in Django settings, not in middleware, not in init_command. Identified the specific background tasks doing bulk write cycles, the views with missing transaction boundaries, the delete-then-recreate services structurally vulnerable to race conditions.
This would have taken me a full day of methodical grep-and-read. The agent produced a concurrency map that didn't exist before the session started.
Then it started getting things wrong.

Three corrections, same shape

Over the course of the investigation, I interrupted the agent five times. Three of those reveal the same structural pattern. The other two are different in kind — I'll get to those.
"That's shared cluster data." CloudWatch showed hundreds of peak connections and sporadic deadlocks across 30 days. The agent built a careful analysis from these numbers, but those numbers belonged to the Aurora cluster, not to our application — low user count, handful of background workers, a fraction of those connections. The deadlocks could be entirely from other tenants on the same cluster.
Could a better prompt have prevented this? Sure. "The Aurora cluster is shared; CloudWatch metrics are cluster-wide." I knew that before the session. It's promptable.
But I expected the agent to catch it. It had the user count, the worker count, the CLAUDE.md. There was enough signal to at least question whether those connection peaks belonged to us. It didn't question anything — processed the numbers as ours and kept going.
"Did you check production only, or were staging and dev mixed?" The agent queried SigNoz for database metrics — millions of reads, tens of thousands of writes, clean latency percentiles. SigNoz had separate service entries per environment, and the agent hadn't verified which ones it was aggregating. The CLAUDE.md, the project docs — the namespace separation was right there.
"What about the slow queue workers?" The agent pulled error data for the main API service and the primary background worker. Missed the slow-queue workers entirely — the ones handling the heaviest bulk operations, the exact workloads where deadlocks would actually manifest. I know which queues carry what because I designed the routing. The agent queried the obvious service names and stopped.
The pattern. Each time, the agent had directional context: the codebase, the project setup, the CLAUDE.md scoping the investigation. The shared cluster, the environment boundaries, the queue routing were all inferrable from material it could see.
Fair objection: "So the context was there and the agent missed it. That's a tooling problem." Not a bad objection — better agents will get better at this.
But notice what's actually happening. The agent treats observability data as self-describing, and observability data inherits the topology of the infrastructure that produces it — structure that doesn't announce itself in the query results. CloudWatch doesn't label which connections belong to which tenant. SigNoz doesn't flag cross-environment aggregation. The data just arrives, looking authoritative.
The total context space includes cluster layout, environment naming conventions, service routing, monitoring configuration, team history with past incidents. The hard part isn't supplying that context. It's knowing which piece applies at the exact moment of inference — and that's a judgment problem wearing a context mask.
Better agent protocols will close some of this gap: verification steps before ingesting observability data, environment boundary checks, worker enumeration. Adversarial loops and self-correction prompts — "before drawing conclusions from this data, verify its scope" — would probably have caught all three. But someone has to write that checklist, and it's the engineer who already knows where the agent will drift. You're pre-encoding judgment into process. The human steering doesn't disappear; it moves earlier in the pipeline.
That objection only holds for these three. The next two resist it entirely.

Two different species

Premature convergence. The agent's first plan evaluated REPEATABLE READ versus READ COMMITTED and recommended switching. MySQL has four isolation levels. The plan hadn't touched READ UNCOMMITTED or SERIALIZABLE, hadn't considered per-connection versus per-transaction strategies, hadn't looked at Aurora-specific features. It optimized before it explored. Not all of those gaps matter equally — per-transaction strategy is the one that actually bites — but the point is the agent never even mapped the decision space before narrowing it. Probably because the REPEATABLE READ vs. READ COMMITTED binary dominates MySQL docs and Stack Overflow, so the training data funnels toward it. Strong early evidence triggers convergence, and decision-space awareness is a judgment, not instruction-following, skill.
Adversarial peer review. After the agent recommended READ COMMITTED, I dropped in a document I'd prepared separately: a fact-check showing this is a defensible but contested position. Dimitri Kravtchuk, Oracle's MySQL performance architect, has shown that READ COMMITTED creates per-statement ReadView overhead causing trx_sys mutex contention at scale — up to 2x performance degradation for short transactions. No prompt can pre-load a rebuttal to a conclusion that hasn't been reached yet. That's the human acting as adversarial reviewer, not correcting the trajectory but stress-testing the destination.

Why this generalizes

The isolation level was the vehicle. The pattern applies anywhere the codebase is an incomplete map of the system — capacity planning, migration strategies, incident response — anywhere the agent can analyze what's visible but can't weigh what's implicit.
Providing context and having the agent apply it at the right inferential moment are different problems, and the second one requires holding the system in your head, which is exactly the kind of knowledge an agent is supposed to help you leverage, not replace.

What actually happened

The agent shipped a solid architecture decision record, a team-facing report, a concurrency architecture document, half a dozen issues for the gaps it found, and a PR with the implementation. READ COMMITTED via init_command, with an escape hatch for the two operations that genuinely need snapshot isolation. Days of senior engineering work compressed into hours.
But the final recommendation was correct because a human who understood the system — the infrastructure, the observability configuration, the workload routing, the external literature — kept redirecting the analysis every time it drifted.
Who in the room holds the topology?
That's the mental model problem applied to infrastructure. The agent is a force multiplier — but a multiplier needs something to multiply.
The faulty inferences — despite context that should have been more than enough — cast mental model shadows: business rules, operations structure, and infrastructure constraints are all projections of the same multidimensional creature that can either devour or protect you forever.

Top comments (0)