Ian Johnson

Posted on Apr 6 • Edited on May 28

The Agent Harness: Turning AI Slop Into Shipping Software

#ai #testing #tdd #laravel

Thesis: You can use AI coding agents on real production codebases and get predictable, high-quality results if you treat the agent like a junior engineer who needs guardrails, not a magic wand.

Audience: Mid-level and senior software engineers curious about AI-assisted development but skeptical of the hype.

Series Arc

The story follows a real Laravel + React codebase over ~3 months and ~258 commits from a legacy monolith with no tests to a well-structured application with automated quality gates, a React SPA migration in progress, and an AI agent that reliably ships production code with minimal supervision.

Stage 1: The Foundation — Tests First, Everything Else Second

Post 1: "Before You Let an Agent Touch Your Code, Write the Tests"

Why tests are the prerequisite for AI-assisted development, not an afterthought. How wrapping a legacy codebase in characterization tests creates the safety net that makes everything else possible. Covers the move from SQLite to MySQL in tests, the UserFactory pattern, and why test infrastructure is the highest-leverage investment.

The codebase before: no tests, no linting, no CI
Why tests come first (they're the reins, not the saddle)
Characterization tests for legacy code
The UserFactory facade pattern
TDD as a communication protocol with agents

Post 2: "Linting, Static Analysis, and the Pre-Commit Hook That Saved My Sanity"

Adding Pint, Psalm, Prettier, ESLint, and pre-commit hooks, not as busywork, but as machine-checkable standards the agent can verify its own work against. Why "make lint" is more important than code review when an agent is writing the code.

The tooling stack: Pint, Psalm, Prettier, ESLint, TypeScript
Pre-commit hooks as automated code review
CI as the final gate
Why agents need checkable standards, not style guides
The compound effect: each tool narrows the failure space

Stage 2: The Refactoring — Making Change Safe

Post 3: "Traits to Services: Refactoring for Testability (and for Agents)"

How extracting six traits into service classes with contracts created clean boundaries the agent could work within. The strategy: plan all six, execute one at a time, keep the app running in production throughout.

The trait problem: global state, hidden dependencies, untestable
Contract-first design: interfaces before implementations
The extraction sequence (Chat, CRM, OCR, Document Conversion, External API, Calculator)
Each extraction behind enough abstraction to keep production running
Why clear boundaries help agents more than documentation

Post 4: "Actions, Policies, and the Art of Obvious Code"

Extracting controller logic into Action classes with Result DTOs, and adding Laravel Policies for authorization. Why making the architecture explicit and boring is the best thing you can do for an AI agent.

Fat controllers → thin controllers + Actions
The Action pattern: one execute(), one Result DTO
Laravel Policies for authorization (replacing inline checks)
Role-scoped query builders
How this accidentally created the perfect migration bridge (web → API)

Stage 3: The Migration — Two Frontends, One Source of Truth

Post 5: "No Big-Bang Rewrites: Running Two Frontends Without Losing Your Mind"

The strategy for migrating from Blade to React without ever stopping feature delivery. The two-frontend architecture, environment gating, and the interim wrapper pattern that lets SPA pages ship to production inside legacy Blade shells.

Why big-bang rewrites fail (and what to do instead)
The two-path architecture: Legacy (Blade) + SPA (React)
Environment gating: SPA only in local/staging/test
The Interim wrapper pattern: SPA components inside Blade shells
Feature flags and analytics for gradual rollout

Post 6: "Trunk-Based Development with Short-Lived Branches"

How trunk-based development, conventional commits, small PRs, and CI/CD create the fast feedback loop that makes AI-assisted development practical. Why the branch should live for hours, not days.

Trunk-based development: why and how
Conventional commits as a communication protocol
The CI pipeline: build → lint → test → deploy
Deployment via webhook
Small batches, continuous integration, always releasable

Stage 4: The Harness — From Micromanaging to Managing

Post 7: "In-the-Loop to On-the-Loop: How I Stopped Micromanaging My AI Agent"

The turning point. Moving from approving every line to trusting the guardrails. What "on-the-loop" means in practice, why it requires everything from stages 1–3, and how the CLAUDE.md harness files made it work.

In-the-loop: reviewing every line, giving real-time direction
The frustration: micromanaging defeats the purpose
On-the-loop: setting direction, reviewing output, trusting guardrails
Why this only works with tests + linting + CI + clear architecture
The feedback loop: harness → agent output → review → update harness

Post 8: "Building the Agent Harness: Subdirectory `CLAUDE.md` Files"

The technical deep-dive on the harness system. Why one big instruction file doesn't work, how subdirectory CLAUDE.md files control context loading, and what goes in each one. Includes real examples from the codebase.

The context window problem: one big file blows up
Subdirectory CLAUDE.md files: lazy-loaded, scoped guidance
What each harness file covers (Actions, Services, Tests, SPA, etc.)
The harness table: mapping areas to guidance files
The feedback protocol: update the harness, reload, re-apply
How the harness checks its own work (make lint, make test, make test-js)

Post 9: "The Curator's Role: Managing a Codebase With an Agent"

The philosophical methodology. Your role shifts from writing every line to curating the repository and the agent. How Modern Software Engineering principles (Dave Farley) map perfectly to AI-assisted development. Why every project's harness is different because you're codifying your judgment. The engineer's role in the age of agents.

Simplicity wins: nine Markdown files, not a custom AI platform
Guardrails first — the case for doing this work up front
You're codifying yourself: the harness is your preferences made explicit
Every project is different: why generic AI advice falls short
The engineer's three roles: curator of design, guardrails, and documentation
Modern Software Engineering: optimize for feedback, work in small batches, empiricism over dogma
The harness as a living document (it evolves with every review)
Results: velocity, quality, confidence

Post 10: "Custom Skills: The End-to-End Workflow Made Executable"

How two custom slash commands turned a repeatable workflow into a consistent, end-to-end process — from Jira ticket to merged PR, with TDD and harness feedback at every step. Why automating the ceremony lets you focus on the judgment calls.

The repetition problem: typing the same instructions every session
Skills as Markdown files: plain English workflows in .claude/skills/
Two skills: /implement-jira-card (from Jira) and /implement-change (ad-hoc)
The eight-phase workflow: scope → requirements → plan → branch → TDD → commit → CI review → refactor
Eight feedback checkpoints: on-the-loop made concrete and repeatable
Harness feedback built into every checkpoint
TDD as a structural phase, not a preference
Separating ceremony from judgment: automate the sequence, keep the decisions
Structure scales, discipline doesn't

Generalizing the Approach

Post 11: "Beyond Laravel: Applying the Agent Harness to Any Stack"

The strategy distilled to its language-agnostic core. How every step in the series — tests, linting, architecture, CI, harness, skills — maps to equivalent tools across Rails, Django, Next.js, Spring Boot, .NET, Go, and Phoenix. Includes a stack comparison table and the four-word summary of the entire approach: constrain, verify, scope, automate.

Additionally, the interim wrapper strategy can be used with many different frontend frameworks in addition to React, such as Vue or Svelte. This is just because it takes advantage of a common feature of these frontend frameworks: the ability to pass down props to components. Each variation between the shell in the legacy views and the spa should have its own prop to keep the spa component (the source of truth) to be open to extension, but closed to modification.

The seven steps as a language-agnostic progression
Stack comparison tables: test infrastructure, linting, architecture patterns, CI/CD, harness
Where to start if you're applying this to a new codebase
The pattern behind the pattern: identify error categories, add machine-checkable constraints
The environment got smarter, not the agent — and that insight generalizes to every stack

Post 12: "Building a Harness: How We Standardized Agentic Coding in a Real Codebase"

The system view. Agents mirror the code they read, so inconsistent codebases produce inconsistent output. The harness is what fixes that: guidance, guardrails, a flywheel, and executable workflows, all living in .claude/. The v1 to v2 rewrite cut documentation by 40% (606 lines to 447) while making rules path-scoped and easier to maintain.

Agents are only as good as the code they read
The .claude/ folder structure: rules, agents, commands, and skills as first-class artifacts
From scattered CLAUDE.md files to organized, path-scoped rules
The four-part model: guidance, guardrails, flywheel, executable workflows
Five parallel review agents and the case for fan-out review
The flywheel: review feedback becomes rule updates, compounding over time
40% reduction in documentation lines (606 → 447) without losing coverage
The harness is only as strong as the code it governs

Gist Strategy

Each post links to 1–3 GitHub gists showing real code from the project:

Harness files (CLAUDE.md examples)
Action class + Result DTO pattern
Service contract + implementation
Test patterns (UserFactory, API tests)
CI workflow excerpts
Interim wrapper component
Pre-commit hook configuration
Makefile targets

DEV Community

The Agent Harness: Turning AI Slop Into Shipping Software

Series Arc

Stage 1: The Foundation — Tests First, Everything Else Second

Post 1: "Before You Let an Agent Touch Your Code, Write the Tests"

Post 2: "Linting, Static Analysis, and the Pre-Commit Hook That Saved My Sanity"

Stage 2: The Refactoring — Making Change Safe

Post 3: "Traits to Services: Refactoring for Testability (and for Agents)"

Post 4: "Actions, Policies, and the Art of Obvious Code"

Stage 3: The Migration — Two Frontends, One Source of Truth

Post 5: "No Big-Bang Rewrites: Running Two Frontends Without Losing Your Mind"

Post 6: "Trunk-Based Development with Short-Lived Branches"

Stage 4: The Harness — From Micromanaging to Managing

Post 7: "In-the-Loop to On-the-Loop: How I Stopped Micromanaging My AI Agent"

Post 8: "Building the Agent Harness: Subdirectory `CLAUDE.md` Files"

Post 9: "The Curator's Role: Managing a Codebase With an Agent"

Post 10: "Custom Skills: The End-to-End Workflow Made Executable"

Generalizing the Approach

Post 11: "Beyond Laravel: Applying the Agent Harness to Any Stack"

Post 12: "Building a Harness: How We Standardized Agentic Coding in a Real Codebase"

Gist Strategy

Top comments (0)

Series Arc

Stage 1: The Foundation — Tests First, Everything Else Second

Post 1: "Before You Let an Agent Touch Your Code, Write the Tests"

Post 2: "Linting, Static Analysis, and the Pre-Commit Hook That Saved My Sanity"

Stage 2: The Refactoring — Making Change Safe

Post 3: "Traits to Services: Refactoring for Testability (and for Agents)"

Post 4: "Actions, Policies, and the Art of Obvious Code"

Stage 3: The Migration — Two Frontends, One Source of Truth

Post 5: "No Big-Bang Rewrites: Running Two Frontends Without Losing Your Mind"

Post 6: "Trunk-Based Development with Short-Lived Branches"

Stage 4: The Harness — From Micromanaging to Managing

Post 7: "In-the-Loop to On-the-Loop: How I Stopped Micromanaging My AI Agent"

Post 8: "Building the Agent Harness: Subdirectory CLAUDE.md Files"

Post 9: "The Curator's Role: Managing a Codebase With an Agent"

Post 10: "Custom Skills: The End-to-End Workflow Made Executable"

Generalizing the Approach

Post 11: "Beyond Laravel: Applying the Agent Harness to Any Stack"

Post 12: "Building a Harness: How We Standardized Agentic Coding in a Real Codebase"

Gist Strategy

Post 8: "Building the Agent Harness: Subdirectory `CLAUDE.md` Files"