DEV Community: diata

The 24-hour test: if you couldn't write it by hand tomorrow, you didn't write it today.

diata — Sat, 25 Apr 2026 08:07:08 +0000

Hello, I'm Tuan.

I have one rule for using AI at work, and it's the only one I trust:

Every line of code I shipped today, I should be able to write again tomorrow with no AI. If I can't, I didn't write it.

That's the test. Ten seconds, end of day. I don't always pass — there's a story below where I clearly didn't. But running the test is the difference between using AI and being used by it.

(If you read yesterday's rant about vibe coding and were about to accuse me of hypocrisy — this post is the receipt. I use AI every day. Here's the line.)

The line: AI does typing, I do thinking

The cleanest framing I've found:

AI is a translator, not a tutor. Translators turn what you already mean into syntax. Tutors teach you what to mean. The first is safe. The second is how you stop being an engineer.

Translation is delegable. The thinking that produces what to translate is not.

Sounds obvious. In practice, the line is invisible until you watch yourself cross it.

What I refuse to delegate

Three things, full stop.

Debugging. When something breaks, the reasoning chain — "this fails because X, so checking Y will tell me if I'm right" — never leaves my head. Outsource it once, you save twenty minutes. Outsource it for a year and you become the candidate from yesterday's post.

Architecture decisions. AI doesn't know my team's oncall load, data gravity, or the political fact that the platform team will block any new service for six months. I'll use it to enumerate options I missed. I will not use it to pick.

Anything in a domain where I don't already have a model. If I can't check the answer, I have no business accepting it. Build the model, then delegate. Never the other way around.

The time I broke my own rule

I'm going to tell on myself, because the post is dishonest without it.

Six months ago I was tired. End of a long sprint, last ticket of the day. A small feature needed a transaction wrapping two writes. I knew exactly what I wanted, so I let Cursor write the wrapper.

It produced a clean-looking BEGIN/COMMIT block: lock the user row, lock the wallet row, do the writes, commit. I read it. Looked right. The integration test passed against a single-threaded fixture. Shipped.

Two days later, deadlocks started showing up under load:

ERROR: deadlock detected
DETAIL: Process 14823 waits for ShareLock on transaction 9182733;
        blocked by process 12044.

The cause: another existing code path acquired those same two rows in the opposite order — wallet first, user second. Two hot rows, two transactions racing, classic crossed-lock deadlock. The integration test never caught it because there was only one writer in the fixture, and concurrency tests had been sitting on the team backlog where they'd been comfortably ignored for a year.

The fix took fifteen minutes — reorder both code paths to acquire locks in the same canonical order. The bug shipped because I'd never traced lock order through the existing code before adding mine. AI couldn't have known about the other path; my own model would have, if I'd held it in my head for one more minute. I had been holding it ten minutes earlier. Then I got tired and let go.

The lesson wasn't "AI was wrong." AI was correct, in isolation.

Correctness in isolation is not correctness. The only thing that catches that gap is a human who is already holding the system in their head.

I now have one extra rule: when I'm tired, I close the AI tab. Tired-Tuan accepts suggestions Awake-Tuan would have caught. The tool's value inverts when judgment is degraded.

A real example, done correctly

Different incident, different week. Postgres query running hot — same flavor of problem as this earlier post.

The actual sequence, with AI use marked at each step.

Step 1 — Hypothesis (no AI). Query had been fine for eighteen months. What changed? A column added two weeks earlier, mostly null in practice. A new ingestion job had tripled write traffic to the table. My guess: the planner was working from stale row estimates. Cheap to test.

Step 2 — Verify (mostly me). Ran EXPLAIN (ANALYZE, BUFFERS). I had to look up the exact BUFFERS flag — five seconds in Claude. The plan came back with a Seq Scan where I expected an index scan. Estimated rows: 200. Actual rows: 240,000. There it was.

Step 3 — Read the plan (AI as second pair of eyes). I read the plan myself first and identified the row-estimate problem. Then I pasted it into Claude and asked "what would you flag here, in order of severity?" It listed five things. I had caught three. Of the two I'd skimmed past, one was a minor memory spill that didn't matter; one was a real signal I'd missed — an unrelated index wasn't being used. I filed it as a follow-up.

Step 4 — Decide the fix (no AI). Options: ANALYZE the table, bump default_statistics_target for the column, or restructure the query. I picked ANALYZE plus a stats target bump, because the ingestion job would keep skewing things. Architecture choice. Mine.

Step 5 — Write the migration (AI typed). I described what I wanted in a sentence. Claude produced the migration. I read every line, ran it against a copy of prod, shipped.

Total time: maybe 35% faster than by hand. Skill atrophy: zero, because every reasoning step was mine. AI was a typist and a second pair of eyes. It never held the model.

If your workflow has the model doing the reasoning and you doing the accepting, the ratio is inverted, and the 24-hour test will quietly fail.

The part where I lose half the seniors reading this

Here's the line I don't think my generation wants to admit:

Most senior engineers who adopted Cursor or Copilot more than a year ago have atrophied. They just haven't been tested yet.

The reason it's invisible is that AI is a multiplier on existing skill, and a multiplier feels like the skill itself. You finish the task faster. You feel sharp. You ship. The seniors who atrophied let AI become a tutor — accepting suggestions whose correctness they couldn't independently verify. The ones who didn't kept it strictly a translator. Same tool, opposite outcomes, and from the outside they look identical for about eighteen months.

This isn't speculation. The aviation industry has been studying it for forty years under the name "automation-induced skill decay." Pilots who fly long-haul on autopilot lose their manual flying skills, measurably and predictably, even when they remain confident in their abilities. The FAA was concerned enough to issue Safety Alert SAFO 13002 explicitly recommending that operators give pilots more opportunities to hand-fly the aircraft, because pilots can no longer self-assess when their skill has slipped. The same mechanism applies to anyone whose work is mediated through automation. We are not special.

I run a manual day on myself once a month: autocomplete off, no chat, full day. The first time I tried it, I was shocked at how rusty the first hour felt. Not on typing — on thinking. I'd been narrating my process to AI for so long I'd forgotten how to narrate it to myself.

I got it back, fast. But I would not have noticed it slipping if I hadn't tested for it.

Yesterday's post was about juniors who never built the model. Today's is about seniors who had it and let it slip. If you're reading this and feel defensive — you already know the answer to the test.

The whole thing in three lines

If you skimmed this far, here it is condensed:

AI does the typing. You do the thinking. Translator, not tutor.
Refuse to delegate three things: debugging, architecture, anything you don't yet have a model for.
Run the 24-hour test on yourself this week. One day a month, run it without the tool.

That's the whole post. The rest is supporting evidence.

The consequence

The day the tool is down, expensive, blocked at your client site, or — most often — confidently wrong about something that ships and breaks at 3 AM, you find out who you actually are as an engineer.

Skill that exists only inside the augmentation evaporates with the augmentation. Skill that exists outside of it doesn't.

Pick which one you want to be.

This is part 2 of a thread that started here. If you're a senior who thinks the atrophy point doesn't apply to you — that's exactly when the test matters most. Try it before you reply.

I write about backend, production incidents, and the unglamorous parts of being a software engineer. Follow if that's your kind of thing.

Vibe coding is producing developers who can't debug. We're going to pay for this.

diata — Sat, 25 Apr 2026 04:38:31 +0000

Hello, I'm Tuan.

I've been doing technical interviews for backend roles recently. The pattern across the last few rounds genuinely scared me — and I don't think we, as an industry, are talking about it honestly.

Most of the candidates I sat with could not debug.

Not "struggled to debug." Not "took longer than I expected." Could not, in any meaningful sense, form a hypothesis about a broken system and test it. The moment something didn't work, the same reflex kicked in: paste the error into Cursor, paste the file, wait, paste again.

I'm not writing this to gatekeep. I'm writing this because something is breaking in our profession, quietly, and the seniors who notice it are mostly staying quiet because saying it out loud sounds like an old man yelling at clouds.

Fine. I'll be the old man.

The interview that made me start writing this

Strong resume. A few years of experience. Confident on system design, fluent on Redis caching, queue patterns, the usual.

I gave him a small task: a Node.js service throwing 500s on a specific input. Logs included. He could use any tool, including AI.

He opened Cursor and pasted the stack trace. Then the source file. Then, when the suggestion didn't fix it, the new error. Then the next error.

Forty minutes in, he had four open tabs of AI suggestions and no working theory of what was actually wrong.

The bug, when I finally walked him through it, was a hallucinated method. Three weeks earlier, his AI assistant had suggested .findUserByEmailOrThrow() — a method that does not exist in our ORM. The code shipped because the test mocked the entire data layer and returned truthy. In production, the call resolved to undefined, the next line dereferenced it, and the service crashed on a specific edge case nobody had hit yet.

He had accepted that line without reading it. He could not have caught it, because he had no model of what code his ORM actually exposed.

I'm not telling this story to mock him. He was smart, articulate, and probably understood distributed systems better than I did at his age.

He just had no internal model of how the code in front of him actually executed.

This wasn't an outlier

I assumed it was. It wasn't.

Across the recent interviews, the same patterns kept showing up:

Most candidates reached for an AI tool within the first minute of seeing an error, before forming a single hypothesis of their own.
A majority could not walk me through what their own committed code did, line by line, when asked.
Several had shipped code containing functions or fields that don't exist in the libraries they claimed to use — hallucinations that survived because tests mocked around them.
A non-trivial number told me, unprompted, that they "don't really debug anymore."

These were not bootcamp graduates. These were mid-level engineers from companies you've heard of.

And it isn't just my interview room. METR's 2025 randomized study on experienced open-source developers found something nobody wants to repeat at standup: developers using AI tools felt 20% faster, and were measured 19% slower. GitClear's 2024 analysis of millions of commits found that the rate of "churned code" — lines added and removed within two weeks — has roughly doubled since Copilot went mainstream. We are shipping more code that we then immediately rip back out.

Something has shifted. And the people best positioned to call it out — senior engineers — are mostly not, because the same tool is making them faster, and it's hard to criticize what's working for you.

What vibe coding actually does to your brain

Here's the part nobody on Twitter wants to admit.

When you debug a problem yourself — really debug it, with print statements and bad guesses and dead ends — you're not just fixing a bug. You're building a mental model of the system. Every wrong hypothesis you eliminate teaches you something about how this code behaves under pressure.

That model is the entire job.

This isn't folk wisdom. Cognitive science has had a name for it for forty years: desirable difficulty. Robert Bjork's research showed that learning sticks in proportion to the friction you experience while doing the task — not in spite of it. Make the practice too easy and the skill never consolidates. The struggle isn't a tax on learning. It is the learning.

Writing code is the easy part. AI is great at the easy part. But the model — the intuition for where the bug probably is, the smell that says this looks fine but it isn't — only forms when you struggle.

When you outsource the struggle, you outsource the model.

AI doesn't make you a worse developer. It makes you a developer who never becomes a better one.

It's like learning chess by only playing with the engine's evaluation bar visible. You'll know which moves are good. You will never know why. The day the bar disappears, you are not a chess player. You are someone who used to be near a chess engine.

The strongest counters, addressed honestly

There are two arguments against everything I just wrote that I take seriously. I want to deal with both, because if I don't, the comments will — and probably less charitably.

Counter 1: "Juniors will learn faster, not slower. AI lets them ship more, hit more edge cases, see more of the system."

I want to believe it. I genuinely do.

I'm just not seeing it in the people walking into the interview room, and I'm not seeing it in the data either. Volume of code shipped is not the same as depth of understanding, and "shipping more" only teaches you something if you have the model to interpret what you shipped. Without the model, more output is just more noise — which is exactly what GitClear's churn numbers look like.

Maybe I'm wrong. Maybe in three years the data flips and juniors reach senior-level intuition faster than ever. I'd love to write the apology post.

Counter 2: "Calculators didn't kill math. Compilers didn't kill assembly devs. IDEs didn't kill C programmers. Every abstraction triggers this panic and the industry adapts. You're being a boomer."

This is the argument I respect most, and it's also the one that's wrong in a specific, important way.

Every previous abstraction in software was deterministic. A calculator that returns the wrong answer for 2 + 2 gets recalled. A compiler that miscompiles is a P0 bug that ships a patch the same week. An IDE that auto-completes a method that doesn't exist is broken software. The contract was: the abstraction is correct; you can trust the output and focus on the layer above.

AI is the first abstraction in our field whose output can be confidently, plausibly wrong, by design, and we ship it anyway. There is no recall. There is no patch. The "bug" is the entire premise.

That changes what skill is required of the user. With a compiler, you didn't need to verify its output line by line — that would defeat the point. With AI, verifying the output line by line is the only thing keeping you from shipping nonsense. And verification requires the exact skill — reading code, building a model, smelling wrongness — that vibe coding skips.

Calculators didn't require a numeracy detector. AI requires a bullshit detector. You can only build that detector by being wrong, alone, many times, before you ever touch the tool.

This is not the same transition. It looks the same from the outside. Underneath, the contract has flipped.

The skill that's quietly disappearing

The skill is not "writing code." The skill is forming a hypothesis about a system you don't fully understand and testing it cheaply.

This is what senior engineers do all day. It's why we get paid. It's also the thing AI cannot teach you, because you can only learn it by being wrong in public, repeatedly, with consequences.

Every time a junior pastes an error into Claude instead of reading it, a small skill atrophies. Compound that across two years of "productivity" and you get someone who can ship a CRUD app from scratch but freezes when production breaks at 2 AM.

Stack Overflow, by the way, was not the same thing. It forced you to read an answer that was usually for a slightly different problem, then adapt it. That adaptation step — "this isn't quite my situation, but if I change X..." — was the learning.

AI removes the adaptation step. It gives you something that looks like it should work, you accept it, you ship it, you learn nothing.

Stack Overflow was a textbook. AI is the answer key.

What I tell juniors now

I've stopped saying "don't use AI." That's both unrealistic and condescending.

What I tell them instead:

Form your own guess before reaching for the tool. Even a wrong guess. Especially a wrong guess. The wrong guess is where the model gets built.
Before you accept any AI suggestion, explain to yourself why it works. Out loud, if you have to. If you can't, don't paste it.
Once a week, fix something the hard way. Pick a bug. Solve it without AI. It will feel slow. That's the point.

This isn't anti-AI. It's anti-atrophy.

The goal is not to never use the tool. The goal is to make sure that when the tool is wrong — and it will be wrong, on the day production is on fire and the stakes are real — you are not standing in front of the wreckage with no idea what to do.

The uncomfortable prediction

In five years, debugging legacy systems will be the highest-paid niche in software, because almost nobody under thirty will be able to do it.

I am not joking and I am not exaggerating. The systems we are building today will still exist. The bugs in them will still exist. Someone will have to walk into a 200,000-line codebase, follow execution by hand, and figure out why a specific request is timing out on Tuesdays.

That person will charge a fortune. There will not be many of them.

If you are early in your career, this is genuinely good news — but only if you choose discomfort now.

The friction of debugging without AI is not a bug. It is the skill being installed.

Skip it now and the bill comes due later. It always does.

If you're a junior reading this and you disagree — I genuinely want to hear why. Tell me what I'm getting wrong, in the comments. I'd rather be argued out of this than be right about it.

I write about backend, production incidents, and the unglamorous parts of being a software engineer. Follow if that's your kind of thing.

An index made our query faster. It slowly suffocated our database.

diata — Fri, 24 Apr 2026 03:19:59 +0000

Hello, I'm Tuan.

When backend engineers encounter a slow query, the first instinct is often something like:

"Check the WHERE and ORDER BY, then just add a composite index."

I used to think the same way.

And to be fair, in many cases, that approach works perfectly fine.

But once, a seemingly correct optimization turned into a production incident. The read query became significantly faster, the EXPLAIN plan looked clean, and everything seemed perfect.

Yet slowly, the entire production system began to degrade.

Database CPU usage spiked.
Disk I/O increased dramatically.
API latency crept upward.

It took me a while to realize the real problem:

I optimized the read path, but completely ignored the write cost.

If you're about to run CREATE INDEX to save a slow API, take a few minutes to read this first.

The Initial Problem

One day, the product team asked for a simple feature:

"Create an API that returns the top 20 hottest products in a category."

Essentially, a real-time trending ranking.

At first glance, the solution seemed trivial — just sort products by a score and return the top 20.

The products table already had around 10 million rows, and traffic was already in the thousands of requests per second. Since this API would appear in a highly visible part of the product, slow responses were not acceptable.

My thinking at the time was straightforward:

Just add the right index and it will be fine.

The "Perfectly Correct" Optimization

The query looked like this:

SELECT
    p.id,
    p.name,
    p.interest_score
FROM products p
WHERE p.status = 'ACTIVE'
AND p.stock_quantity > 0
AND p.category_id = 42
ORDER BY p.interest_score DESC
LIMIT 20;

Running this on a table with millions of rows would cause a full scan and sort, which obviously wouldn't scale.

So I applied the classic solution:

CREATE INDEX idx_products_category_status_score
ON products (category_id, status, interest_score DESC);

The results looked fantastic.

The query became dramatically faster
The EXPLAIN plan looked perfect
Response time dropped immediately

From the perspective of query performance, everything seemed solved. At that moment, I felt pretty confident about the fix.

Unfortunately, that confidence didn't last long.

When Production Started Acting Strange

The issue was something I completely overlooked.

interest_score was not a static column.

Every time users interacted with a product — viewing details, liking it, or adding it to the cart — the score increased. Something like this happened constantly:

UPDATE products
SET interest_score = interest_score + 1
WHERE id = ?;

At first, this seemed harmless. Incrementing a number is one of the most common operations in any system.

But the moment interest_score became part of an index, that simple update was no longer simple.

The System Didn't Crash — It Slowly Suffocated

The worst kind of production issue is the one that doesn't fail loudly.

There were no crashes. No obvious errors. The system just became slower and slower.

Over time we observed:

API latency gradually increased
Database CPU usage spiked
Disk I/O skyrocketed
Some requests started timing out
Slow query logs filled with UPDATE statements

Initially we blamed traffic growth. After all, the SELECT query was indexed and looked perfectly fine.

But after monitoring the system closely, the real culprit finally became clear — the heavy load was coming from the updates to interest_score.

The Real Problem

The index itself was not wrong. The real issue was the hidden write cost.

Whenever interest_score changes, the database cannot simply update a number in place. Because the column participates in an index used for sorting, the database must also maintain the index structure.

Conceptually, it means:

The record must be removed from its old position in the index and reinserted into a new one.

With a few updates, this is trivial. But when thousands of updates per second hit the system, maintaining that index becomes extremely expensive.

In other words: the index optimized reads, but it dramatically increased the cost of writes.

The Hotspot Problem

User interactions are not evenly distributed. Popular products receive far more clicks than others.

That meant many updates were hitting the same rows repeatedly, creating contention inside the database. Even though the code looked harmless:

UPDATE products
SET interest_score = interest_score + 1
WHERE id = ?;

Under the hood, the database was handling heavy concurrent updates to the same regions of data and index pages. The system was effectively fighting itself.

That was the moment I realized something important:

An index that speeds up a query does not necessarily make the system healthier.

And in hindsight, the real design mistake was trying to make the main transactional table handle real-time ranking.

The Hard Decision: Removing the Index

Removing the index felt wrong at first. After all, it had significantly improved the query performance.

But the metrics were clear. As long as that index existed, write contention would remain.

So we removed it.

The result was immediate:

Write pressure dropped significantly
Disk I/O stabilized
Database CPU usage returned to normal levels

The ranking query itself became slower again, but at least the entire system was no longer being dragged down by a single column update.

That moment taught me an important lesson:

Some problems that look like SQL optimization tasks are actually architecture problems.

The Alternative: Rethinking the Architecture

Instead of forcing the database to handle both persistent data and real-time ranking, we split responsibilities.

Score updates were moved to Redis Sorted Sets.

When user actions occur, we increment the score in Redis:

ZINCRBY trending:cat:42 1 12345

The new flow became simple:

User action updates score in Redis
When ranking is needed, fetch the top IDs from Redis
Fetch product details from the database using id IN (...)

This allowed each system to focus on what it does best:

Of course, this design also came with trade-offs. Redis might return products that are out of stock or inactive — so we had to fetch slightly more results and filter them in the database. We also accepted eventual consistency instead of perfect real-time synchronization.

But overall, the system became far more scalable and stable.

What I Learned

Since that incident, I approach slow queries very differently.

Before adding an index, I now ask myself a few questions:

Is this column frequently updated?
How much write overhead will this index introduce?
Am I optimizing a query, or optimizing the entire workload?

For highly dynamic values like ranking scores, like counts, and view counts — I avoid updating the main transactional table directly. More often than not, the real bottleneck is not SQL syntax, but choosing the right system for the workload.

That composite index wasn't technically wrong. But in the context of our production traffic, it was the wrong decision.

And today, I care less about whether a query becomes faster. I care more about this question:

Does this change actually make the whole system healthier?

Because in production systems, correctness is not defined by the speed of a single query. It is defined by how the entire system behaves under real traffic.

If you found this helpful, follow me for more deep dives into Backend Architecture.