Rapls

Posted on Jun 7 • Edited on Jun 25 • Originally published at zenn.dev

AI wrote 80% of my plugin. Six months later I couldn't maintain it.

#ai #codequality #productivity #webdev

The first thing I noticed when I reopened that plugin after six months was that the same date-formatting logic lived in three places.

One in a utility function, one in a class method, one inline in a template. All slightly different. The bug someone reported, a display that was occasionally off, came from the oldest of the three. Fixing it meant first figuring out which copy to touch, and whether the other two needed touching too. The bug wasn't hard. Reading the shape of my own code was. I'd let AI write about 80% of it, fast, and I never wrote a spec.

I build WordPress plugins and use coding agents most days, so I'm not here to tell anyone to stop using them. This is about keeping the speed and still being able to maintain what comes out of it. Below is what I changed, ordered by how much it actually helped. The top two alone make future-me a lot less miserable.

The setup, for context

Versions move fast, so treat the names below as "true when I checked, verify on your machine."

Claude Code (claude --version) and Codex CLI (codex --version), both used daily
PHP 8.3 / WordPress, targets are plugins and themes
Persistent instruction files: CLAUDE.md for Claude Code, AGENTS.md for Codex CLI

None of this is a perfect workflow. It's what one person who got burned once put together afterward.

Why it goes unreadable in six months

Three sentences of diagnosis before the fixes. Speed isn't the problem. What speed quietly strips out is.

The reasons for design decisions live only in the chat, and they vanish when you close the window. You accept diffs without reading them, so you never spend the time that makes code feel like yours. And you hand structure to the agent session by session, so the same job gets written three different ways. That third one is exactly my three date functions.

Everything below plugs one of those three holes.

1. Land every decision in the repo

This helped the most. Take the "why" that was evaporating in chat and put it next to the code. Two habits.

First, I changed what CLAUDE.md is for. It's not only an instruction file for the agent. The back half is a record of decisions for future-me: not what I did, but why, plus the option I rejected and the reason. The rejected option matters most later, because future-me will have the same "good idea," rebuild it, and fall into the same hole.

## Rules (for the agent)
- Prefix every function, hook, and option key with `tmfs_`
- Stop and ask before changing any public API

## Decisions (for future-me)
- Webhook signature check is hand-written with hash_equals, not a library.
  Didn't want the dependency, and wanted the check readable at a glance.
- Invalid signatures return 200 and just log, not 400.
  So an attacker can't learn whether the check passed. Was 400; changed for this.

Second, one paragraph in docs/decisions.md every time I add a feature. I tried writing a full spec after finishing. It never stuck: a spec written from memory is half wrong, and it's heavy enough that I kept postponing it. One paragraph, written while I still have the momentum of closing the feature, I actually write.

### 2026-06-05 Rate limit
- Cap outbound API calls at 20/min. Their limit is 30/min; left headroom.
- The agent said no limit was needed. Added it anyway; the queue jammed once before.
- Open question: make the count configurable in settings? Fixed for now.

A warning on this one. If you ask the agent to "write the decision log," you get something that reads well but is reverse-engineered from the code, not the actual reason. The plausible explanation and the real one look similar and aren't. Take the draft, then rewrite it into what you were actually thinking. Six months out, only the real reason helps.

2. Keep the review gate small

To break the habit of accepting unread diffs, I changed a setting too.

I run defaultMode: "acceptEdits" (I wrote about that config separately). It cuts prompts and feels great, and on the maintenance side it quietly encourages not reading. So I overcorrected and tried to read everything. That killed the speed and I gave up by lunch. Extremes don't last.

What stuck was naming a short list of diffs I always read, and letting the agent auto-accept the rest.

## Always read (human gate)
- Public API: hook names, function signatures, REST routes
- DB schema, tables, option keys
- Auth, capabilities, sanitization, escaping
- Any single commit over ~80 lines
Everything else (internal refactors, test additions, comments) -> acceptEdits.

These are the changes that are expensive to undo later. A renamed hook silently breaks its callers; a missing escape shows up as a security finding in six months. Internal refactors and test additions are cheap to get wrong, or tests catch them. Drawing the line by blast radius made the reading load small enough to keep doing.

3. Hold the structure and naming yourself

The three-copies problem came from handing structure to the agent per session. Left alone, an agent expands: new function, new file, a second helper when one already exists. So I fix the frame and let it move inside it. The same rules go in both AGENTS.md and CLAUDE.md, because if only one has them the style shifts the moment I switch tools.

## Naming and structure
- Top level is admin / public / includes. Don't add others.
- Before adding a class, check for an existing similar one.
- One responsibility per file. Over 500 lines, *propose* a split (don't just do it).

The "propose, don't do" part earns its keep. When the agent splits files on its own, the code I'm looking for has moved and I can't find it. For a WordPress plugin the prefix rule already exists, so writing the implicit conventions in my head into the file is most of the work. Left implicit, they reach neither the agent nor future-me.

4. Comments and commits carry only the "why"

Agents add comments, and most of them say what the code does, which is useless in six months and turns into a lie the moment the code changes and the comment doesn't. In review I cut the "what" comments and add the "why" the code can't show on its own.

// Use hash_equals for the signature. == leaks timing and can be
// broken a character at a time.
if ( ! hash_equals( $expected, $given ) ) {
    return;
}

Why not == is written nowhere in the code. When future-me thinks "== is fine here" and reaches to simplify it, those two lines stop the hand. Changing the instruction from "add comments" to "comment only where the reason isn't obvious; skip describing behavior" cuts the noise.

Commit messages are the same kind of place. Left to the agent they read "Fix bug." First line is the what (fine to delegate); I add one sentence of why to the body.

fix: cut FX fetch off at a 3s timeout

Checkout was hanging on the FX API. Returning the page beats an exact rate.

When git blame lands me on that line months later, the reason is right there.

5. Make tests double as a readable spec

If the spec never gets written, let the tests be the spec. I name tests by behavior, not by the function under test.

// before
public function test_verify() { ... }

// after
public function invalid_signatures_are_logged_and_swallowed() { ... }
public function gives_up_after_three_failed_sends() { ... }

Skim the test list and you read what the plugin promises, and unlike a doc it can't drift, because a changed behavior fails the test. When I have the agent write tests I ask for "the behaviors this should guarantee, named by behavior," not "more coverage." Tests that trace internals break on every refactor and end up commented out. Tests that check the outside promise survive. Six months on, those were the only ones still alive.

WordPress makes some of this hard with real DB and hooks involved. There I don't force it; I leave a WP-CLI sequence or a manual checklist in docs/ instead. Whatever survives six months and stays reproducible is the goal, not coverage for its own sake.

The lighter ones that still paid off

A line of reasoning in the decision log every time I add a dependency. Agents reach for the latest, and the latest isn't always safe when your users run old PHP. "Avoided libraries needing 8.1+ syntax so it runs on 8.0" is the kind of outside constraint the code never reveals.

And five lines at the top of the README, written for future-me: what this is, where to start reading, where the decisions live, what's easy to break. Agents write exhaustive READMEs, but exhaustive takes energy to read and so it doesn't get read. A short map beats it on the return trip.

One thing the second agent is good for

Not a comparison, a maintenance tool. I have Codex CLI read code Claude Code wrote and ask it to explain it and point out maintenance weak spots (and the reverse). The author's explanation goes soft because the intent is visible to them; an agent that doesn't know the intent reads it cold, the way future-me will. Where its explanation stalls is where future-me stalls, and that's where a comment is missing. It once flagged "this depends on hook execution order but the assumption isn't in the code," which was exactly right.

The catch: the second agent's explanation isn't fact either. It told me a function used a cache that didn't exist. Use the output as a way to surface what you overlooked, and verify it yourself.

The honest list of what didn't stick

After-the-fact specs: half wrong, too heavy, perpetually postponed. Reading every diff: didn't survive contact with the speed. A proper ADR template: dropped in three days because recalling the format was friction, and friction means it doesn't get written. A heavy reason on every commit: too much; now I only write one when the why will matter later.

The common thread is that they were too heavy or too perfectionist. What lasted was whatever I made light enough to look almost like cutting corners. Make it embarrassingly small and it keeps happening. That's the thing six months taught me most clearly.

A note to my next self

I didn't slow everything down. I stop only at the seams: when I'm deciding something, changing a boundary, closing a feature. Slow down there, leave one line of why, and let the agent run fast through the rest.

The surprise was that keeping the "why" made my prompts better too. Once you can name what you're deciding, the instruction you hand the agent stops being vague. The record I keep for future-me quietly speeds up present-me.

Those three scattered date functions are still three. I haven't decided whether to merge them or retire the whole thing. But the code I'm writing now, I think future-me can walk through without getting lost. Standing in a house written in a stranger's hand, holding only the key, was enough the first time.

Originally written in Japanese on Zenn. I build WordPress plugins.

Disclaimer: The experiences and decisions in this post are my own. English isn't my first language, so I use an AI assistant to help draft and edit the writing.

DEV Community