DEV Community

diata
diata

Posted on

The 24-hour test: if you couldn't write it by hand tomorrow, you didn't write it today.

Hello, I'm Tuan.

I have one rule for using AI at work, and it's the only one I trust:

Every line of code I shipped today, I should be able to write again tomorrow with no AI. If I can't, I didn't write it.

That's the test. Ten seconds, end of day. I don't always pass — there's a story below where I clearly didn't. But running the test is the difference between using AI and being used by it.

(If you read yesterday's rant about vibe coding and were about to accuse me of hypocrisy — this post is the receipt. I use AI every day. Here's the line.)

The line: AI does typing, I do thinking

The cleanest framing I've found:

AI is a translator, not a tutor. Translators turn what you already mean into syntax. Tutors teach you what to mean. The first is safe. The second is how you stop being an engineer.

Translation is delegable. The thinking that produces what to translate is not.

Sounds obvious. In practice, the line is invisible until you watch yourself cross it.

What I refuse to delegate

Three things, full stop.

Debugging. When something breaks, the reasoning chain — "this fails because X, so checking Y will tell me if I'm right" — never leaves my head. Outsource it once, you save twenty minutes. Outsource it for a year and you become the candidate from yesterday's post.

Architecture decisions. AI doesn't know my team's oncall load, data gravity, or the political fact that the platform team will block any new service for six months. I'll use it to enumerate options I missed. I will not use it to pick.

Anything in a domain where I don't already have a model. If I can't check the answer, I have no business accepting it. Build the model, then delegate. Never the other way around.

The time I broke my own rule

I'm going to tell on myself, because the post is dishonest without it.

Six months ago I was tired. End of a long sprint, last ticket of the day. A small feature needed a transaction wrapping two writes. I knew exactly what I wanted, so I let Cursor write the wrapper.

It produced a clean-looking BEGIN/COMMIT block: lock the user row, lock the wallet row, do the writes, commit. I read it. Looked right. The integration test passed against a single-threaded fixture. Shipped.

Two days later, deadlocks started showing up under load:

ERROR: deadlock detected
DETAIL: Process 14823 waits for ShareLock on transaction 9182733;
        blocked by process 12044.
Enter fullscreen mode Exit fullscreen mode

The cause: another existing code path acquired those same two rows in the opposite order — wallet first, user second. Two hot rows, two transactions racing, classic crossed-lock deadlock. The integration test never caught it because there was only one writer in the fixture, and concurrency tests had been sitting on the team backlog where they'd been comfortably ignored for a year.

The fix took fifteen minutes — reorder both code paths to acquire locks in the same canonical order. The bug shipped because I'd never traced lock order through the existing code before adding mine. AI couldn't have known about the other path; my own model would have, if I'd held it in my head for one more minute. I had been holding it ten minutes earlier. Then I got tired and let go.

The lesson wasn't "AI was wrong." AI was correct, in isolation.

Correctness in isolation is not correctness. The only thing that catches that gap is a human who is already holding the system in their head.

I now have one extra rule: when I'm tired, I close the AI tab. Tired-Tuan accepts suggestions Awake-Tuan would have caught. The tool's value inverts when judgment is degraded.

A real example, done correctly

Different incident, different week. Postgres query running hot — same flavor of problem as this earlier post.

The actual sequence, with AI use marked at each step.

Step 1 — Hypothesis (no AI). Query had been fine for eighteen months. What changed? A column added two weeks earlier, mostly null in practice. A new ingestion job had tripled write traffic to the table. My guess: the planner was working from stale row estimates. Cheap to test.

Step 2 — Verify (mostly me). Ran EXPLAIN (ANALYZE, BUFFERS). I had to look up the exact BUFFERS flag — five seconds in Claude. The plan came back with a Seq Scan where I expected an index scan. Estimated rows: 200. Actual rows: 240,000. There it was.

Step 3 — Read the plan (AI as second pair of eyes). I read the plan myself first and identified the row-estimate problem. Then I pasted it into Claude and asked "what would you flag here, in order of severity?" It listed five things. I had caught three. Of the two I'd skimmed past, one was a minor memory spill that didn't matter; one was a real signal I'd missed — an unrelated index wasn't being used. I filed it as a follow-up.

Step 4 — Decide the fix (no AI). Options: ANALYZE the table, bump default_statistics_target for the column, or restructure the query. I picked ANALYZE plus a stats target bump, because the ingestion job would keep skewing things. Architecture choice. Mine.

Step 5 — Write the migration (AI typed). I described what I wanted in a sentence. Claude produced the migration. I read every line, ran it against a copy of prod, shipped.

Total time: maybe 35% faster than by hand. Skill atrophy: zero, because every reasoning step was mine. AI was a typist and a second pair of eyes. It never held the model.

If your workflow has the model doing the reasoning and you doing the accepting, the ratio is inverted, and the 24-hour test will quietly fail.

The part where I lose half the seniors reading this

Here's the line I don't think my generation wants to admit:

Most senior engineers who adopted Cursor or Copilot more than a year ago have atrophied. They just haven't been tested yet.

The reason it's invisible is that AI is a multiplier on existing skill, and a multiplier feels like the skill itself. You finish the task faster. You feel sharp. You ship. The seniors who atrophied let AI become a tutor — accepting suggestions whose correctness they couldn't independently verify. The ones who didn't kept it strictly a translator. Same tool, opposite outcomes, and from the outside they look identical for about eighteen months.

This isn't speculation. The aviation industry has been studying it for forty years under the name "automation-induced skill decay." Pilots who fly long-haul on autopilot lose their manual flying skills, measurably and predictably, even when they remain confident in their abilities. The FAA was concerned enough to issue Safety Alert SAFO 13002 explicitly recommending that operators give pilots more opportunities to hand-fly the aircraft, because pilots can no longer self-assess when their skill has slipped. The same mechanism applies to anyone whose work is mediated through automation. We are not special.

I run a manual day on myself once a month: autocomplete off, no chat, full day. The first time I tried it, I was shocked at how rusty the first hour felt. Not on typing — on thinking. I'd been narrating my process to AI for so long I'd forgotten how to narrate it to myself.

I got it back, fast. But I would not have noticed it slipping if I hadn't tested for it.

Yesterday's post was about juniors who never built the model. Today's is about seniors who had it and let it slip. If you're reading this and feel defensive — you already know the answer to the test.

The whole thing in three lines

If you skimmed this far, here it is condensed:

  1. AI does the typing. You do the thinking. Translator, not tutor.
  2. Refuse to delegate three things: debugging, architecture, anything you don't yet have a model for.
  3. Run the 24-hour test on yourself this week. One day a month, run it without the tool.

That's the whole post. The rest is supporting evidence.

The consequence

The day the tool is down, expensive, blocked at your client site, or — most often — confidently wrong about something that ships and breaks at 3 AM, you find out who you actually are as an engineer.

Skill that exists only inside the augmentation evaporates with the augmentation. Skill that exists outside of it doesn't.

Pick which one you want to be.


This is part 2 of a thread that started here. If you're a senior who thinks the atrophy point doesn't apply to you — that's exactly when the test matters most. Try it before you reply.

I write about backend, production incidents, and the unglamorous parts of being a software engineer. Follow if that's your kind of thing.

Top comments (1)

Collapse
 
csm18 profile image
csm

I am not yet a developer, but in my pet projects I use AI for debugging.
For, example if I write a compiler then I ask AI to write the testing or debug print functions.