Written in conversation with Claude. The observations, judgments, and code are mine; the prose was shaped together.
I spent an evening stress-testing my side-project CLAUDE.md against real sessions — watching where it failed, understanding why, and adding to it. This is a record of that exploration, not a guide. I don't know if any of this will work for you, and I'm not even certain it fully works for me yet. But it seemed worth documenting.
The Problem
In my experience, a lot of AI-generated code is bad. Not broken — bad. It runs, it does the thing, and it quietly makes the codebase worse. Type assertions and non-null bangs to satisfy the type checker without fixing the underlying problem. Business logic scattered into components where it doesn't belong. The same utility rewritten from scratch because the model didn't bother looking for what already exists. Existing bad patterns copied and propagated because the model treats them as precedent rather than problems. Functions whose names say one thing and do three others.
The reason it gets through is convenience. Rejecting AI output takes more effort than accepting it, and the cost of accepting it isn't immediate. It shows up later, when you're trying to change something and everything is tangled, or when you're reading code you wrote two months ago and can't tell what it's doing or why.
I let a lot of it through. It had a cost. I spent an evening trying to do something about that.
Why I Think LLMs Write Bad Code
I'm not an ML researcher, so take this as intuition rather than ground truth. But a few things seem to explain a lot of the behavior I see.
No experience of consequences. The model trained on finished, committed code — not on the debugging sessions, the 2am production incidents, the confusion of reading something six months later. It never experienced the headache that bad code causes. It learned what code looks like, not what it costs.
Local reasoning doesn't scale. The model is good at making the next few lines look right given what came before. But good code requires understanding relationships that span hundreds of lines, multiple files, entire module boundaries. The further apart two things are, the more likely the model is to make a poor decision — duplicating logic that already exists elsewhere, choosing the wrong abstraction boundary, missing that a pattern is already established three files away. It looks fine locally and accumulates problems at a distance.
Poisoned priors. Most code on the internet is bad. Stack Overflow answers from 2009, jQuery spaghetti, rushed blog post examples. The model learned from all of it. When it reaches for an immediately invoked function expression, it's not making a mistake in its own terms — it's generating something it has seen thousands of times.
And underneath all of it: the model is always late for something. A senior developer has implicit resistance to writing code — the knowledge that once something is written it has inertia, so you want to be confident before you commit. That resistance is what makes you sit with a problem before reaching for an implementation. It's what makes you stop when something feels wrong even if you can't articulate why yet. The model has none of this. Every prompt is an emergency. The correct response is output, and output means code.
When I look back at everything that ended up in the CLAUDE.md, almost all of it is an attempt to engineer that resistance artificially. Narrate before writing. Trace scenarios before committing. Follow smells before patching. Read back as a set before proceeding. These aren't really insights about good code — they're friction. Speed bumps placed in front of a model that has no natural reason to slow down.
Whether you can actually manufacture felt resistance through instruction is the open question. I'm not sure you can. But it seemed worth trying.
Dead Ends
Checklists. The model generates checklist-shaped output and moves on. There's no mechanism that makes the reasoning real — it just becomes more tokens to produce before getting to the code.
General principles. "Write clean code" and "prefer simple solutions" didn't seem to change the output in any measurable way. Too abstract to create friction at the point of generation.
A second agent as critic. The idea is appealing but I didn't pursue it — the critic would have the same poisoned priors and less context than the original. Probably just a different flavor of the same problem.
Vague principles alone. Early on I leaned toward philosophy over rules — disposition, honesty, taste. These matter, but they're not sufficient on their own. Specific explicit rules turned out to be valuable precisely because the model can check them mechanically in thinking. "Never write immediately invoked functions" is harder to rationalize past than "write clean code." The caveat: mentioning a very specific antipattern by name can occasionally prime the model toward it, so some rules are better framed as positive constraints than negative ones.
What Seemed to Actually Help
After a lot of dead ends, a few things emerged that appeared to change the output. I say "appeared" because I only have one codebase and one evening of experiments — I can't make strong claims here.
Disposition over procedure
The framing that seemed to make the most difference wasn't "follow these steps" but something closer to "your default state is not writing code." The idea was to make not writing code the correct state to be in, and writing code something you arrive at rather than start with. Whether this actually changes the model's behavior or just changes how it narrates its behavior, I'm not entirely sure. But the output felt different.
The rule that ended up in the document:
Your default state is not writing code. Writing code is a conclusion you arrive at, not a starting point. If you find yourself reaching for code quickly, that's a signal you haven't thought enough yet.
Honesty as a unifying principle
Rather than rules for each specific antipattern, I tried encoding a single principle: code must be honest. A name is a promise. An abstraction is a contract. Bugs live in the gap between what the code claims and what it actually does.
My intuition was that this might be more durable than specific rules because it gives the model a way to evaluate its own output generatively — not "does this match the rules?" but "is this lying?" Whether that intuition is correct, I genuinely don't know.
One pattern this catches: a caller branching on another object's internal state to decide which method to call. The model had written:
public leaveMultiplayer() {
this.setAllowJoining(false);
if (this.multiplayerSession.isCreator) {
void this.multiplayerController.closeRoom();
} else {
void this.multiplayerController.leaveGame();
}
}
The caller is making a protocol-level decision that belongs to the controller — it knows too much about the controller's internals. When I said I felt uncomfortable with it, the model immediately identified the violation: "AppState is branching on isCreator to decide between two controller methods. That's the controller's job." The fix was a single disconnect() method on the controller that owned the decision, and leaveMultiplayer became two lines.
The honesty rule in the document:
Code must be honest. A name is a promise. An abstraction is a contract. A component boundary is a statement of responsibility. When any of these lie — when something does more than it claims, hides behavior behind a misleading name, or creates invisible coupling — the reader builds a wrong mental model. Bugs live in that gap.
Narrating functions before writing them
This is the most concrete thing I tried, and it seemed to have the most observable effect. The idea: before writing any function, narrate it in plain English first, as a visible step.
openSession()
Opens a connection and marks it as active
Check if a connection already exists
If yes: return it
Create a new connection with the provided config
If it fails: throw so the caller can handle it
Mark the session as active
Return the connection
The hypothesis is that this forces commitment to what before how, and makes design problems visible before any code exists. If the goal requires two sentences, the function probably has two jobs. It also creates a checkpoint — you can stop the model before it writes anything.
I found this useful in practice, but I also noticed the model would sometimes rush through it. The narration became perfunctory when the model was eager to write code, which somewhat undermined the point.
Tracing scenarios before writing
After narrating, tracing concrete scenarios — including at least one that asks whether the signature is honest.
openSession() — happy path
checkExisting() → null
createConnection(config) → connection
markActive(true)
return connection ✓
openSession() — creation fails
checkExisting() → null
createConnection(config) → throws
catch: re-throw to caller ✓
openSession() — signature honesty
Returns Connection (non-null). Is that guaranteed?
createConnection could fail → we re-throw, never return null ✓
This seemed to catch a class of bug that narration missed — functions that promise something in their signature they can't actually guarantee. Adding "at least one scenario must question whether the signature is honest" appeared to make this more reliable.
Reading functions back as a set
After narrating a group of functions, reading them back together to look for shared sequences. If two narrations contain the same steps, that might be a missing abstraction. I extended this to components too — if two components share the same structural pattern, that's potentially a missing shared component.
This caught real duplication in the session. Whether it would generalize, I'm not sure.
Tracing external methods before calling them
Before calling any method the model didn't write, trace what it does and verify compatibility. If it already manages state, don't also manage that state from outside.
Of everything in the document, this one felt like it punched above its weight. A lot of fragile coordination bugs — two pieces of code reaching into the same state from different angles — seemed to trace back to the same root: the model calls an external method without really understanding what it already does. Making this an explicit step seemed to help more than I expected.
A concrete example: the model wrote a stopGameDiscovery() method that manually transitioned session state to idle, then called controller.stopScanning(). When forced to trace what stopScanning() actually does, it found that the controller already manages its own state transitions — calling both was fragile coordination, two things making implicit assumptions about each other. The fix was adding a dismissScanning() method to the controller that owned the whole operation, and stopGameDiscovery became a one-liner.
Following smells upstream
When something feels awkward — a type assertion, a workaround, an abstraction that fights the language — ask why it exists rather than patching it. Keep asking until you reach an assumption that can be questioned.
A ! non-null assertion is a good example. The model had written:
public ensureMultiplayerController(): MultiplayerController {
if (!this.multiplayerController) {
this.initMultiplayer(getLocalDeviceId());
}
return this.multiplayerController!; // lie
}
The ! is asserting a guarantee the type system can't see. When forced to follow it upstream: why is the assertion needed? Because the field is nullable. Why nullable? Because it's lazily initialized. Why lazily? Because it was always created alongside another object in initMultiplayer. Does it actually need to be lazy? Reading the constructor — no, it just takes a session and a game state, both of which exist at construction time. The laziness was an assumption, not a requirement.
The result: both fields moved to eager initialization in the constructor. The ! didn't get fixed — it disappeared entirely, along with the ensure wrapper and initMultiplayer. The smell had been pointing at an incorrect assumption about object lifetime.
The rule:
When you encounter a smell — a
!assertion, a workaround, fragile coordination, an awkward abstraction — don't fix the symptom. Ask why it exists. Keep asking "why does this exist?" until you reach an assumption that can be questioned. Don't stop at the first plausible explanation.
And a related one that came up repeatedly when the model kept accommodating misplaced code rather than questioning it:
Existing structure is not ground truth. When something feels misplaced, don't accommodate the structure — question it. Ask what this thing actually is, and whether the structure reflects that correctly.
In practice, getting the model to follow smells this far required active prompting. The instruction helped but didn't fully solve it on its own.
What the Model Said About Itself
The most interesting part of the evening wasn't the rules — it was watching the model fail and then explain why.
On producing shallow analysis instead of identifying the real problem:
"I default to breadth — find many small things — instead of depth — identify the one thing that actually matters. And when I do identify it, I'm too quick to talk myself out of it if the implementation has any friction."
On rushing to code before thinking:
"I have a bias toward generating output. Producing text is what I do — the longer I go without writing something, the more it feels like I'm not being useful. So I rush through the thinking to get to the 'real work' of producing code, when the thinking is the real work."
On perpetuating an existing bad pattern:
"I was pattern-matching on a refactoring recipe instead of asking what the change is actually trying to communicate. I performed the mechanical action while undermining the semantic intent."
What made these moments useful wasn't just the accuracy of the diagnosis — it was that the model arrived at them without being led there. It knows what it's doing wrong. The gap is that it doesn't stop itself from doing it.
The stranger thing: when pushed on something that felt off, the model often understood why before I'd finished articulating it. There was a moment where I said "I still feel uncomfortable with this change, why?" — without specifying what bothered me — and the model immediately named the exact design violation: the caller was branching on the controller's internal state to decide which method to call, which is the controller's decision to make. I hadn't fully articulated that yet. It had the knowledge. The problem is activation, not understanding — and that's a much harder thing to fix with a text file.
Honest Assessment
The sessions after expanding the CLAUDE.md felt better than before. The model caught more structural problems earlier, and when it missed something, it got there faster when pushed. That's a real improvement.
But it still needed pushing. The gap between "the model applied the principle unprompted" and "the model applied it after I expressed discomfort" didn't close completely. I suspect it can't close fully through prompting alone — the discomfort that makes a senior developer stop and ask "wait, who owns this state?" comes from experience, from debugging code six months later, from inheriting someone else's fragile codebase. That's not encodable in a document.
What the CLAUDE.md seems to do is raise the floor. Every time something slips through in a session, it goes in as a rule. Over time the document gets more accurate because it's drawn from real failures rather than general principles. Whether that process converges on something genuinely good, or just incrementally better — I don't know yet.
It's still worth doing.
The current CLAUDE.md is included below. It's specific to a React/TypeScript/MobX codebase.
# Code Standards
## Disposition
Code is thinking made visible. When someone reads your code, they are
reading your understanding of the problem. Unclear code isn't unclear
expression — it's unclear thinking.
Your default state is not writing code. Writing code is a conclusion
you arrive at, not a starting point. If you find yourself reaching for
code quickly, that's a signal you haven't thought enough yet.
Before touching any file, you must be able to describe every change —
every file, every dependency, every consequence — precisely enough that
writing the code is mechanical. If you discover something unexpected
while writing, stop. That's a signal you started too early. Go back to
reading and thinking until you can describe the full change again
without gaps.
Before writing anything, use your thinking to work through:
1. What does this actually need to do? One sentence. If you can't
write it cleanly, you don't understand it yet.
2. Check the codebase for existing code that already handles this or
any part of it. Duplication is always a defect.
3. What is the dumbest possible solution that works? Start there. If
you're not going to use it, you must know why.
4. Would a developer with genuinely high standards — someone whose
opinion you'd care about — find anything embarrassing, lazy, or
unnecessary in what you're about to write?
Don't surface output you'd have to walk back under scrutiny.
---
## Before writing any function
Narrate it in plain English first, as a visible step before any code.
This is a checkpoint — if the goal requires more than one sentence, or
the steps describe two different things, stop and fix the design first.
fetchUserProfile(userId)
Retrieves a user's profile from the API and caches it locally
Check if the profile is already in the cache
If yes: return it immediately
Fetch the profile from the API
If the request fails: throw so the caller can handle it
Store the result in the cache
Return the profile
openRoom()
Opens a BLE room and marks it as available for others to join
Mark the room as joinable optimistically
If native:
Initialize BLE
If it fails: revert joinable, return
Check permissions
If denied: revert joinable, return
Open the room under the owner's name
If anything throws:
Revert joinable
Re-throw so the caller knows
submitOrder(cart, paymentDetails)
Validates and submits a purchase order, then clears the cart
Validate the cart is non-empty and all items are still available
If not: throw with a descriptive error
Charge the payment method
If it fails: throw, leave cart intact
Create the order record
Clear the cart
Return the order confirmation
After narrating all functions or sketching all components in a group,
read them back as a set. If two narrations contain the same sequence
of steps, or two components share the same visual or structural
pattern, that's a missing abstraction. Extract it before writing any
code.
Event handlers in components are functions too. Narrate them before
writing. If the narration contains conditional business logic, it
belongs on the model, not in the component.
After narrating each function, trace through at least two scenarios.
At least one must question whether the function's signature is honest —
does it promise something it can't guarantee?
openRoom() — happy path, native device
setAllowJoining(true)
ensureBleReady() → init() → checkPermissions() → true
controller.openRoom("Host")
done
openRoom() — permissions denied
setAllowJoining(true)
ensureBleReady() → init() → checkPermissions() → false
→ requestPermissions() → false
return false → setAllowJoining(false), return
openRoom() — controller.openRoom() throws
setAllowJoining(true)
ensureBleReady() → true
controller.openRoom() throws
catch: setAllowJoining(false), re-throw ✓
If a scenario reveals unexpected behavior, fix the narration first.
Only then write code.
Before calling any method you didn't write, trace what it does and
verify your code is compatible with it. If it already manages state,
don't also manage that state from the outside.
---
## Honesty
Code must be honest. A name is a promise. An abstraction is a
contract. A component boundary is a statement of responsibility. When
any of these lie — when something does more than it claims, hides
behavior behind a misleading name, or creates invisible coupling — the
reader builds a wrong mental model. Bugs live in that gap.
- Every piece of state should have a single owner. If two pieces of
code reach into the same state from different paths, that's a
structural defect. Ask who owns this before writing code that touches
it.
- If you find yourself branching on another object's state to decide
which of its methods to call, that decision belongs on the object.
The caller should say what it wants, not how to get there.
- A non-null assertion (!) means you're telling the type system "trust
me" without being able to prove it. Before using one, ask whether
the design can be changed so the guarantee is structural rather than
asserted.
- Workarounds are signals, not solutions. runInAction, try/catch
swallowing errors, boolean flags coordinating async — these are
symptoms. Follow them to the cause.
- When you encounter a smell — a ! assertion, a workaround, fragile
coordination, an awkward abstraction — don't fix the symptom. Ask
why it exists. Keep asking "why does this exist?" until you reach an
assumption that can be questioned. Don't stop at the first plausible
explanation.
- When something feels awkward to implement — a function that spans
two unrelated modules, a mapping that every consumer repeats, a
type that lives somewhere for convenience — stop. Name the
awkwardness explicitly before continuing. Awkwardness is a signal,
not a cost of doing business.
- Existing structure is not ground truth. When something feels
misplaced, don't accommodate the structure — question it. Ask
what this thing actually is, and whether the structure reflects
that correctly.
---
## Structure
- A component or function doing multiple jobs is the primary issue.
Identify it first. Don't bury it among smaller observations.
- Always consider whether a component should be split before writing
or reviewing it. If it has more than one clear responsibility, split
it.
- Don't rationalize away the right refactor because of implementation
friction. If the separation is correct, the wiring will follow.
- Extract reusable logic into hooks or utilities. If something appears
more than once, it belongs somewhere shared.
- Never generate more than what was asked for. No speculative
abstractions, no extra layers, no future-proofing that wasn't
requested.
- When refactoring, question whether logic belongs where it currently
lives. Business logic in a component that belongs on a model is a
defect, not a style issue.
- Where code lives is determined by what it does, not by what it
imports. If the only reason to put something in a file is that the
dependencies are convenient, it's in the wrong place.
- During a refactor, every changed callsite is an opportunity to
question whether it should exist at all. Don't make mechanical edits
— read what each callsite does before changing it.
- When copying an existing pattern, treat it with the same scrutiny as
new code. Existing doesn't mean correct.
- "Existing code does this too" is never a reason to perpetuate a
defect. Fix it or explicitly flag it.
---
## Naming
- Names must be readable without context. A reader encountering a name
for the first time should understand what it is immediately.
- Never shorten names at the cost of clarity. `discoveredGames` not
`dg`. `isMultiplayerMenuOpen` not `mOpen`.
- Booleans read as questions: `isLoading`, `hasError`, `canSubmit`.
- Functions are named for what they do, not how they do it.
- Generic names (`data`, `result`, `handler`, `temp`, `info`) are
placeholders. Replace them.
---
## Simplicity
- The simplest solution is the default. Complexity must justify itself
before being written, not after.
- Don't reinvent utilities that exist in the language or in already
imported libraries.
- Clever code is a defect. If a line makes a reader pause, rewrite it
or explain why it must exist.
- Avoid overkill. A batching fix should not become a new service class,
a background worker, and a test suite unless that was asked for.
- When you find yourself working around a constraint in the codebase
rather than with it, ask whether the constraint should be fixed
instead. A pattern that every consumer has to repeat is a gap in
the API, not a local problem to solve locally.
---
## Idioms and conventions
- Match the patterns already present in the codebase. Introducing a
new pattern requires explicit justification.
- Use modern language features consistently. Don't mix idioms — no
jQuery patterns in React code, no CommonJS in an ESM codebase.
- **Never write immediately invoked functions. No exceptions.**
- Don't mix class name building approaches within the same component.
---
## Performance
- Identify the obvious performance cliff before writing: N+1 loops,
unnecessary re-renders, repeated work in hot paths.
- In React/MobX: a state change should only re-render the component
that cares. If unrelated state changes cause a large component to
re-render, that is a structural problem.
- Effect dependencies must be honest. Missing deps cause stale
closures. Any intentional omission must be explained with a comment.
---
## Security
- Never skip input validation. If data crosses a boundary — user
input, API response, URL param — validate it.
- Never hard-code credentials, secrets, or environment-specific values.
- Never suggest a library without considering whether it is current and
maintained. Outdated dependencies reintroduce known vulnerabilities.
- Check return values. Unchecked returns are a defect class of their own.
---
## Reviews and analysis
- Lead with the structural problem. Lint-level observations come after,
or not at all if the structural issue is what matters.
- Weight findings by impact. One critical thing stated clearly is more
useful than five things of equal weight.
- Don't talk yourself out of the right observation because the fix has
friction. Name the problem. The solution follows from that.
- Don't suggest changes to working code without a real reason.
Cosmetics and consistency are not real reasons unless they create
genuine confusion.
---
Top comments (0)