Ian Johnson

Posted on Apr 8

The Curator's Role: Managing a Codebase With an Agent

#ai #programming #softwareengineering #architecture

The Simplest Thing That Could Work

There's a temptation, when you decide to "use AI for software development," to build something complicated. A custom orchestration layer. A RAG pipeline over your codebase. A fine-tuned model trained on your conventions. A plugin ecosystem.

I used Markdown files.

The entire agent harness for this project is plain Markdown, checked into the repo, loaded automatically by Claude Code based on which directory you're working in. No custom tooling. No infrastructure. No maintenance burden.

CLAUDE.md                           ← Root guidance
app/Actions/CLAUDE.md               ← Action patterns
app/Services/CLAUDE.md              ← Service patterns
tests/CLAUDE.md                     ← Test patterns
resources/js/spa/CLAUDE.md          ← SPA patterns
...9 files total

That's it. Nine Markdown files. The agent reads them, follows them, and produces code that matches the project's conventions.

I want to be very clear about this because the industry is drowning in complexity around AI tooling: the simplest approach worked. Not as a starting point. Not as a minimum viable product. As the actual, final, production approach that I use every day.

When in doubt, choose the boring solution.

Guardrails First. Always.

If there's one message in this entire series, it's this: do the guardrails work up front.

I know. It's not the fun part. Writing tests for existing code is tedious. Setting up linting is yak-shaving. CI pipelines are plumbing. Nobody got excited about a pre-commit hook.

But here's what happens when you skip the guardrails and go straight to "AI agent writes my code":

The agent writes code that looks correct
You deploy it
Something breaks in production
You debug it
You realize the agent made an assumption that your tests would have caught — if you had tests
You fix the bug and add a test
You repeat this for every bug

That's the expensive path. You're paying the cost of guardrails anyway, but you're paying it in production bugs, debugging time, and lost confidence. You're paying retail instead of wholesale.

The cheap path:

Write tests
Add linting
Set up CI
Establish patterns
Build the harness
Then let the agent write code

Now the agent's output is verified before it ships. Tests catch behavioral bugs. Linting catches structural issues. CI catches everything else. The harness guides the agent toward correct patterns from the start.

Every hour spent on guardrails saves ten hours of debugging. I can't prove that number, but I believe it in my bones after watching this codebase evolve.

The Compound Effect of Simple Rules

Each guardrail is simple:

Tests: does the code do what it should?
Pint: is the PHP formatted consistently?
Psalm: are the types correct?
Prettier: is the JS/CSS formatted?
ESLint: are the React patterns correct?
TypeScript: are the frontend types sound?
Pre-commit: did we check before committing?
CI: did everything pass together?
Harness: did we follow the project's patterns?

None of these are individually impressive. But together they create a narrowing funnel. The space of "code the agent could produce" starts enormous. Each guardrail eliminates a category of wrong answers. By the time code passes all of them, the remaining space is almost entirely correct code.

This is why the approach scales. I didn't build a sophisticated AI system. I built a series of simple filters. The AI writes whatever it writes, and the filters ensure only good code survives.

Modern Software Engineering and Agents

Dave Farley's Modern Software Engineering argues that software engineering is the application of empiricism and the scientific method to building software. The core practices:

Work in small batches — small commits, small PRs, fast integration
Optimize for fast feedback — tests, CI, trunk-based development
Experimentation — try things, measure results, adapt
Continuous delivery — always releasable, deploy when ready

Every one of these maps directly to AI-assisted development:

Small batches = small PRs the agent can produce and you can review in minutes. Not 2,000-line diffs. Not week-long branches. One feature. One fix. One refactoring. Merge and move on.

Fast feedback = make lint && make test gives you a definitive answer in minutes. The agent runs these checks. If they pass, the code is good. If they fail, the agent fixes and tries again. The feedback loop is tight.

Experimentation = the harness is a hypothesis. "If I give the agent these patterns, it will produce code like this." Update the harness when the hypothesis is wrong. Run the experiment again. This is the scientific method applied to AI collaboration.

Continuous delivery = trunk-based development with CI means every merge is deployable. The agent produces code that's always ready to ship, not code that needs a cleanup pass before release.

Farley couldn't have predicted AI agents when he wrote the book, but he described exactly the practices that make them work.

The Harness Optimizes Feedback — For You and the Agent

The harness has two audiences:

For the agent: "Here's how to write an Action class. Here's the pattern. Here are the anti-patterns. Follow this."

For you: "Here's what I expect the agent to produce. If it doesn't match, either the agent drifted or the harness needs updating."

The feedback protocol makes this bidirectional:

You review output → Agent drifted? → Update harness → Agent re-reads → Better output
                   → Harness gap? → Update harness → Agent re-reads → Better output
                   → Looks good?  → Commit and ship

Every review either confirms the harness is working or improves it. The harness gets better over time. The agent's first-attempt accuracy improves over time. Your review time decreases over time.

This is the ratchet effect. The system improves in one direction. It doesn't degrade. Each lesson is captured. Each correction is permanent.

You're Codifying Yourself

Here's something I didn't expect: building the harness forced me to articulate decisions I'd been making unconsciously for years.

Why do I prefer constructor injection over facades? Why do I insist on Result DTOs instead of returning models? Why do Actions have one public method? Why does the SPA component own the logic while the interim wrapper is just plumbing?

I had reasons for all of these. They were informed by years of experience, books I'd read, projects I'd worked on, mistakes I'd made. But they lived in my head. When I was writing every line of code, they came out through my fingers. When an agent writes the code, they need to come out through the harness.

The harness is a codification of your engineering judgment. Your preferences. Your project's specific constraints. Your team's conventions. Your domain's requirements.

And every project's harness will be different. A fintech codebase has different concerns than a social media app. A Go microservice has different patterns than a Laravel monolith. A greenfield project has different rules than a legacy migration.

This is why "just use AI to write code" is incomplete advice. The AI doesn't know your project. It doesn't know your domain. It doesn't know why you chose contracts over concrete classes, or why authorization goes through Policies instead of middleware, or why the SPA is gated to non-production environments.

You know. And your job is to make that knowledge explicit, machine-readable, and automatically loaded at the right time.

The Engineer's Role in the Age of Agents

If agents can write code, what do engineers do?

Three things:

1. Curator of Design.
You decide the architecture. Actions, Services, Policies, query builders. These are design decisions that shape how every feature gets built. The agent follows design. It doesn't create it. Your taste, your judgment, your experience with tradeoffs, and that's irreplaceable.

2. Curator of Guardrails.
You build and maintain the system that verifies output: tests, linting, CI, pre-commit hooks. Without guardrails, agent output is unchecked. The guardrails are your engineering standards made executable.

3. Curator of Documentation.
Not READMEs for humans; guidance for agents. Harness files that encode patterns, constraints, and anti-patterns. Documentation that's loaded in context, not filed in a wiki nobody reads.

The code is a byproduct. The real output of a software engineer is the system of constraints that makes correct code the default and incorrect code structurally difficult.

This isn't a diminished role. It's a leveled-up role. You're working at a higher level of abstraction. You are defining what good looks like instead of typing it character by character.

On-the-Loop Management

With all the pieces in place, your role becomes:

Setting direction. What to build. What to migrate next. Which Jira tickets to pick up. Architecture decisions. Tradeoffs.

Writing specs as tests. The TDD red phase is your primary communication channel with the agent. Failing tests are unambiguous specifications.

Reviewing output. Reading diffs, not writing code. Checking "did the agent follow the patterns?" not "is this semicolon in the right place?"

Curating the harness. When the agent drifts, you don't just fix the instance, you fix the guidance. The correction propagates to all future work.

Managing infrastructure. Docker, CI/CD, deployment pipelines, queue workers, Redis, feature flags. The plumbing that makes everything else work.

This is management, not micromanagement. You're responsible for the system's output, but you're not manually producing every artifact.

The Story in Numbers

This project went from zero to here in about 3 months:

Metric	Value
Total commits	258
Pull requests	145
PHP tests	2,700+
Conventional commits	122 (47% of total)
Refactoring commits	17
Test-specific commits	14
Feature commits	29
Fix commits	53
React SPA pages	~15
Features shipped via interim wrappers	6
Harness files	9
Big-bang rewrites	0

One engineer. One AI agent. Nine Markdown files. And a codebase that went from "untested legacy monolith" to "well-structured application with dual-frontend migration, automated quality gates, and continuous delivery."

What I'd Do Differently

I'd write the harness earlier. I built the harness at commit #109 (out of 258). If I'd built it at commit #30, after the initial test and linting setup, every subsequent commit would have benefited from guided agent output.

I'd invest more in test infrastructure early. The UserFactory facade was a game-changer. Every similar shortcut (factory states, test helpers, assertion macros) pays dividends across hundreds of tests.

I'd document scoping rules from day one. "New features go here. Bug fixes go here. Don't touch this." The earlier the agent knows the rules, the fewer corrections you make.

What I Wouldn't Change

Starting with tests. Non-negotiable. Everything else depends on the safety net.

Keeping it simple. Markdown files, Makefile commands, Docker containers, conventional commits. Every piece is boring. Every piece works. The boring stack is the reliable stack.

Incremental migration. Never once did we stop shipping features to "do infrastructure." The migration happened alongside feature work, commit by commit, PR by PR.

The feedback protocol. Updating the harness on every review. This is the single highest-leverage habit in the entire workflow.

The Point

You can use AI coding agents on real production codebases and get predictable, high-quality results.

Not by hoping the agent is smart enough. Not by building custom AI infrastructure. Not by trusting vibes.

By doing the boring work first:

Write the tests
Add the linting
Set up CI
Establish patterns
Build the harness
Update the harness continuously

Then let the agent operate within those constraints. Review the output. Update the constraints. Ship the code.

The agent didn't get smarter over these three months. The guardrails got better. The harness accumulated lessons. The codebase developed a shape that made it harder to do the wrong thing and easier to do the right thing.

That's not AI magic. That's engineering.

DEV Community