The most useful thing you can do for your coding agent is not a better prompt or a smarter model. It's a more boring codebase.
This sounds like a joke. It is not.
What "boring" means
Boring code is not bad code. Boring code is code that does exactly what every other piece of code in the same situation does. It uses the same patterns, the same naming, the same error handling, the same file structure. When you have read one module, you have roughly read them all. There are no surprises, no special cases, no clever shortcuts that only the original author understood.
Boring code is what you get when a team commits to a small number of conventions and refuses to deviate from them. It is the opposite of "every engineer expressing themselves." It is engineering as a coordinated activity rather than a series of individual performances.
Senior engineers have always quietly preferred boring code, because they spend more time reading code than writing it and they know what unboring code costs. Agents make that preference economically unavoidable.
Why agents reward this
An agent working on your codebase is doing something close to advanced pattern-matching, against everything it has been trained on and everything in the repo it has just loaded into context. The more consistent that context is, the more reliable the matching.
If every controller in your codebase validates input the same way, the agent will validate input that way without being told. If half of them use one pattern and half use another, the agent will pick one (possibly the wrong one for the file it is editing) and you will find out at code review, or worse, at runtime.
This is not a quirk of any particular model. It is what happens when you feed a pattern-matcher inconsistent patterns. The codebase is a prior. Inconsistent priors produce inconsistent outputs.
The novelty tax
Every unusual thing in your codebase is a tax the agent has to pay. Some examples of what this tax looks like in practice:
A bespoke logging wrapper that does almost-but-not-quite what the standard library does. The agent reaches for the standard one because that's what most code uses, and the resulting logs end up in a different place from everything else.
A single service that uses a different ORM than the rest of the system, for historical reasons nobody remembers. The agent writes code in the style of the majority and quietly breaks the minority.
A clever bit of metaprogramming in one module that auto-generates handlers from annotations. The agent does not recognize the trick, adds a handler manually, and now there are two handlers for the same route.
None of these are bugs in the agent. They are bugs in the codebase, where "bug" means "places where the patterns are inconsistent enough that a competent pattern-matcher will get them wrong." Humans hit the same problems; they just compensate by spending five minutes reading nearby files first. Agents are faster than that, which means their mistakes are also faster.
The tax compounds. A codebase with three weird patterns is annoying. A codebase with thirty is a place where agent reliability collapses and nobody can quite say why.
Standards as a contract
Coding standards have always been a contract between current and future contributors. The current author agrees to write code in a particular way; future contributors agree to read it in that way. The contract is what makes the codebase legible over time.
The list of future contributors has expanded. It now includes agents: yours, your teammates', whichever ones your company adopts next year, and whichever ones you use yourself in two years that don't exist yet. None of them will have been in the original design discussions. All of them will read the code the same way: as a source of patterns to extrapolate from.
A team with strong, documented conventions is offering all of those future contributors the same deal. A team without them is offering each one a guessing game.
Mechanical enforcement beats good intentions
The conventions that work are the ones that do not depend on anyone remembering them. Linters, formatters, type systems, and CI checks are how you take a convention out of the realm of "we agreed to do it this way" and into the realm of "the build fails if you don't."
Prettier and gofmt are uninteresting tools that have done more for codebase consistency than a thousand style guides. ESLint rules with autofix turn discussions into commits. A strong type system mechanically rules out entire categories of code that an agent might otherwise produce.
This matters for agents specifically because agents respond to feedback. An agent that runs the linter and sees an error fixes the error. An agent that produces code which conforms to the linter on the first try produces it because the patterns it learned from your codebase already point that direction. Either way, the tool is doing work that human reviewers would otherwise have to do — at human speed, against agent throughput.
Conventions on disk
The other practice worth investing in is the boring one: write your conventions down, in the repo, in a file the agent can read.
The exact filename varies (AGENTS.md, CLAUDE.md, a CONTRIBUTING.md, a top-of-tree README). What matters is that the things your team knows implicitly become explicit, in a place an agent loads into context at the start of every session. "We use functional components, not classes." "Errors bubble up to the route handler; do not catch them in the service layer." "Database access goes through the repository module; no raw queries."
These are the things a new human engineer would learn from a week of code review and a few uncomfortable PRs. The agent does not get a week. It gets one file. Make sure the file says what the code already implies, so the agent's priors line up with your reality.
Use a harness
Writing conventions down is the starting move. The next one is wrapping them in a harness — the layer that turns a static list of "we do things this way" into something the agent actively references, gets corrected by, and helps maintain.
A useful working definition: a harness is everything around the agent except the agent itself. Note that I am not saying model specifically, because there is some overlap in terminology. Some folks call Claude Code (and other comparable products) a harness. In a way it is: it's a harness around the model. Here, I am specifically talking about an agent harness defined in the code that improves the results of the agent (this session of Claude Code).
The CLAUDE.md the agent loads. The rules and skills in your .claude/ directory. The slash commands the team has standardized on. The checks that run when the agent edits a file. The loop that turns "a reviewer caught a bad pattern" into "the next agent run won't do that." Agent = Model + Harness. You don't get to pick a new model every Tuesday. You can engineer the harness every day.
If you are starting from zero, bridle is a sensible default. It's a Claude Code plugin that scaffolds CLAUDE.md and a .claude/ directory with rules, agents, skills, and commands, and ships with slash commands for the lifecycle: /bridle:learn turns a review comment into a durable rule, /bridle:audit walks the repo looking for rule violations, /bridle:harness-health flags rules that have gone stale or coverage that has thinned. The opinionation is the point — you get to a working baseline in an afternoon instead of bikeshedding directory layouts for a sprint.
But the specific tool is not the argument. The argument is that you need a working harness — one that lives in the repo, that the agent reads on every session, that the team updates when the agent surprises them, and that closes the loop between "we caught this in review" and "the next run won't do it." A homegrown setup made of three markdown files and a pre-commit hook beats a sophisticated harness that nobody touches. What matters is the discipline: treat the harness as part of the codebase, version it, review changes to it, and feed it every time the agent does something you would correct in a code review.
The teams that get the most out of agents are the ones that have built this layer deliberately. The teams that get the least are still hoping that a single CLAUDE.md and a strong model will be enough. It won't.
How this connects to the pipeline
The thread running through this whole argument is the same one that runs through Continuous Delivery and through blameless post-mortems: the system, not the individual, holds the standard.
A linter check in CI is a standard the system enforces. A type error caught at build time is a standard the system enforces. A regression test that fails when the agent reaches into the wrong layer of the stack is a standard the system enforces. None of them depend on a reviewer remembering the rule at the moment they happen to be reading the diff.
Agents are going to produce a lot of diffs. The reviewers cannot scale. The standards have to.
The leverage
If you want to get more out of your agents tomorrow, the highest-leverage move is rarely a model upgrade. It is the unglamorous work of making your codebase predictable: pick one pattern per problem, document the patterns, enforce them mechanically, and refactor the loud exceptions until they are quiet.
Boring code is a force multiplier. It always was, for humans. Agents just put a number on it.
Top comments (0)