Ian Johnson

Posted on Apr 8

Beyond Laravel: Applying the Agent Harness to Any Stack

#ai #programming #tdd #productivity

The Strategy Is the Point

This series followed a Laravel + React codebase. But if you've been reading for the strategy and not the syntax, you already know: none of this is Laravel-specific.

Tests before agents. Linting as machine-checkable standards. Clean architecture so the agent follows patterns instead of inventing them. Trunk-based development for fast feedback. Harness files that scope guidance to where the agent is working. Custom skills that turn your workflow into structure.

Every step has an equivalent in whatever stack you're using. The tools change. The progression doesn't.

The Seven Steps

Here's the agent harness approach distilled to its language-agnostic core. Each step builds on the ones before it. You cannot skip ahead: the entire system is load-bearing.

Step 1: Test Infrastructure

What you're doing: Wrapping the existing codebase in tests that run against real dependencies (the same database engine, the same cache, the same queue) so you have a machine-checkable safety net before the agent touches anything.

What matters:

Characterization tests first. Lock in what the code does before you change what it should do. These aren't aspirational tests. They're documentation of current behavior.
Real dependencies, not fakes. If production runs Postgres, your tests run Postgres. SQLite-in-memory is a lie that will catch up with you.
One command to run everything. make test, npm test, ./gradlew test — the agent needs a single entry point. If running tests requires tribal knowledge, the agent will get it wrong.
Test factories that are hard to misuse. Give the agent a discoverable API for creating test data. Fluent builders, factory patterns, fixtures with clear names. Design for the dumbest correct user, because that's how the agent will use it.

Step 2: Linting and Static Analysis

What you're doing: Adding machine-checkable standards for code style, type safety, and structural quality. Each tool eliminates an entire category of wrong output from the agent.

What matters:

Format, lint, and type-check — all three. Formatting removes style arguments. Linting catches structural problems. Type checking catches logic errors. Together they narrow the space the agent can operate in.
One command to check everything. make lint, npm run lint, a Makefile target that runs the full stack. The agent runs this before every commit.
Pre-commit hooks that block and explain. The hook should fail with a message the agent can read and act on. "Run npx prettier --write . to fix" is better than "formatting error on line 47."
CI as the gate that cannot be skipped. Pre-commit hooks are the first check. CI is the final one. The agent cannot merge without green CI.

Step 3: Architecture and Boundaries

What you're doing: Refactoring toward clean boundaries (interfaces, services, clear separation of concerns) so the agent can work within a bounded area without needing to understand the whole system.

What matters:

Contracts before implementations. Define interfaces first. The agent can implement an interface without understanding the rest of the system. It cannot safely modify a God class.
One responsibility per unit. Whether it's a service class, a module, a use case — the agent works best when each unit does one thing and the boundaries are obvious.
Architecture as documentation. If the codebase has a clear pattern (actions, services, repositories, commands), the agent follows it. If every file is a snowflake, the agent improvises. You don't want improvisation.
Small, safe steps. One extraction per PR. Keep the app running in production throughout. Never refactor and change behavior in the same commit.

Step 4: Explicit Patterns for Business Logic

What you're doing: Establishing the patterns the agent should follow for new work: how business logic is structured, how authorization works, and how data flows through the system.

What matters:

A single pattern for business logic. Actions, use cases, commands, interactors — the name doesn't matter. What matters is that there's one pattern, it's consistent, and the agent can see ten examples in the codebase.
Centralized authorization. Scattered permission checks are a security risk with human developers. With an agent, they're a guarantee of inconsistency. Use your framework's policy/guard/permission system.
Typed inputs and outputs. Form objects, request validators, result types, DTOs — whatever your stack calls them. The agent needs to know what goes in and what comes out.

Step 5: Migration Strategy (If Applicable)

What you're doing: If you're migrating frontends, databases, or major subsystems: running old and new in parallel, migrating incrementally, never doing a big-bang rewrite, etc.

What matters:

Both systems run simultaneously. The old system serves production. The new system is gated behind environment flags or feature toggles until proven.
Page by page, feature by feature. Each migration is a small PR. Each small PR goes through the full test/lint/CI pipeline.
Clear scoping rules. The agent needs to know: does this work go in the old system, the new system, or both? Make the rules explicit in the harness.

Step 6: Trunk-Based Development and CI/CD

What you're doing: Establishing the delivery cadence that makes AI-assisted development practical: short-lived branches, small PRs, fast CI, and automated deployment.

What matters:

Branches live for hours, not days. The longer a branch lives, the more the agent's assumptions go stale. Small batches, fast merges.
CI runs the full pipeline. Build, lint, type-check, test, deploy. If any step fails, the PR doesn't merge.
Conventional commits. A machine-readable commit history helps the agent understand what changed and why. It also helps you when you're reviewing 145 PRs in three months.
Automated deployment. Push to main, deploy to staging. The feedback loop from code change to running software should be minutes, not hours.

Step 7: The Harness and Skills

What you're doing: Writing scoped guidance files that tell the agent how to work in each area of the codebase, then codifying your workflow into repeatable skills.

What matters:

Scoped guidance, not one big file. One harness file per major area. The agent loads what's relevant to where it's working. Keep the signal-to-noise ratio high.
Patterns with examples, not just rules. Show the agent a code example of the pattern you want. "Do it like this" beats "follow these principles" every time.
Anti-patterns are explicit fences. Tell the agent what not to do. "Never put HTTP concerns in an Action" is more useful than "keep Actions pure."
The feedback protocol. When the agent drifts, ask: is this a harness gap? If yes, update the harness first, then re-apply. Corrections become permanent rules.
Skills codify the sequence. Automate the ceremony (read ticket, write tests, implement, lint, commit, push, PR). Keep the judgment calls at checkpoints.

The Stack Table

Here's how each step maps to tools across popular web framework stacks. The rows are the steps. The columns are the stacks. Every cell answers: "What would I use here?"

Test Infrastructure

	Runner	DB Strategy	Factories / Fixtures	One Command
Laravel (PHP)	PHPUnit / Pest	MySQL in Docker (tmpfs)	Model Factories	`php artisan test`
Rails (Ruby)	RSpec / Minitest	Postgres in Docker	FactoryBot	`bundle exec rspec`
Django (Python)	pytest-django	Postgres in Docker	factory_boy / Model Bakery	`python -m pytest`
Next.js (TypeScript)	Vitest / Jest	Postgres via Testcontainers	Prisma seed scripts / custom builders	`npm test`
Spring Boot (Java)	JUnit 5	Testcontainers (Postgres/MySQL)	TestEntityManager / custom builders	`./gradlew test`
ASP.NET (C#)	xUnit / NUnit	Testcontainers or LocalDB	Bogus + custom builders	`dotnet test`
Go	`testing` + testify	Testcontainers or dockertest	Custom factory functions	`go test ./...`
Phoenix (Elixir)	ExUnit	Postgres sandbox	`ex_machina`	`mix test`

Recommendation: Wrap your test command in a make test target. It gives the agent (and your team) a single, stack-agnostic entry point that hides flags, environment setup, and Docker orchestration behind one command. When every project starts with make test, nobody needs to remember whether it's php artisan test, go test ./..., or bundle exec rspec.

Linting and Static Analysis

	Formatter	Linter	Type Checker	One Command
Laravel (PHP)	Pint	Psalm / PHPStan	Psalm (level)	`./vendor/bin/pint --test && ./vendor/bin/phpstan`
Rails (Ruby)	RuboCop (formatting)	RuboCop (style/lint)	Sorbet / Steep	`bundle exec rubocop`
Django (Python)	Black / Ruff format	Ruff / Flake8	mypy / pyright	`ruff check . && mypy .`
Next.js (TypeScript)	Prettier	ESLint	TypeScript (`tsc --noEmit`)	`npm run lint && npx tsc --noEmit`
Spring Boot (Java)	google-java-format / Spotless	Checkstyle / SpotBugs	javac (compile-time)	`./gradlew check`
ASP.NET (C#)	dotnet format	Roslyn analyzers / StyleCop	C# compiler + nullable refs	`dotnet format --verify-no-changes`
Go	`gofmt` / `goimports`	`golangci-lint`	Go compiler	`golangci-lint run`
Phoenix (Elixir)	`mix format`	Credo	Dialyxir	`mix format --check-formatted && mix credo && mix dialyzer`

Recommendation: Wrap your lint pipeline in a make lint target. Most stacks need multiple tools chained together — formatter, linter, type checker — and the exact flags change over time. A make lint target keeps the agent from needing to know whether your project runs ruff check . && mypy . or golangci-lint run. One target, full coverage, zero tribal knowledge.

Architecture Patterns

	Service Layer	Business Logic Unit	Authorization	Request Validation
Laravel (PHP)	Service classes + contracts	Action classes	Policies	Form Requests
Rails (Ruby)	Service objects / POROs	Command / Interactor	Pundit / Action Policy	Strong Parameters + dry-validation
Django (Python)	Service layer (manual)	Service functions / Command pattern	django-rules / permissions	Serializers / Pydantic
Next.js (TypeScript)	Server actions / service modules	Use case functions	Middleware + CASL / next-auth	Zod schemas
Spring Boot (Java)	`@Service` classes	`@Service` or Command pattern	Spring Security + `@PreAuthorize`	`@Valid` + Bean Validation
ASP.NET (C#)	Service classes via DI	MediatR handlers / Command pattern	Authorization policies + `[Authorize]`	FluentValidation
Go	Package-level service structs	Handler / Use case functions	Middleware + Casbin	Struct validation (go-playground)
Phoenix (Elixir)	Context modules	Context functions / Command pattern	Bodyguard	Ecto changesets

CI/CD and Delivery

	CI Platform	Deploy Tool	Branch Strategy	Commit Convention
Laravel (PHP)	GitHub Actions	Forge / Envoyer	Trunk-based, short-lived branches	Conventional Commits
Rails (Ruby)	GitHub Actions / CircleCI	Kamal / Capistrano / Heroku	Trunk-based, short-lived branches	Conventional Commits
Django (Python)	GitHub Actions / GitLab CI	Gunicorn + systemd / Docker + ECS	Trunk-based, short-lived branches	Conventional Commits
Next.js (TypeScript)	GitHub Actions / Vercel CI	Vercel / Docker + ECS	Trunk-based, short-lived branches	Conventional Commits
Spring Boot (Java)	GitHub Actions / Jenkins	Docker + Kubernetes / AWS ECS	Trunk-based, short-lived branches	Conventional Commits
ASP.NET (C#)	GitHub Actions / Azure DevOps	Azure App Service / Docker + ECS	Trunk-based, short-lived branches	Conventional Commits
Go	GitHub Actions	Docker + Kubernetes / systemd	Trunk-based, short-lived branches	Conventional Commits
Phoenix (Elixir)	GitHub Actions	Fly.io / Docker + release	Trunk-based, short-lived branches	Conventional Commits

Harness and Skills

	Harness Format	Scoped Files	Skill Definition	Agent Tool
All Stacks	Markdown (CLAUDE.md)	One per major directory	`.claude/skills/SKILL.md`	Claude Code

The harness and skills layer is entirely stack-agnostic. It's Markdown files in your repo. The content changes (your patterns, your anti-patterns, your architectural rules) but the mechanism is the same regardless of language.

Where to Start

If you're looking at this table and wondering where to begin, here's the priority order:

1. Tests. If you have nothing else, start here. Get a test runner working against real dependencies with a single command. Write characterization tests for the most critical paths. This alone makes the agent dramatically safer.

2. Linting. Add a formatter and a linter. Wire them into a pre-commit hook. This takes an afternoon and eliminates an entire category of bad output.

3. CI. Connect your test and lint commands to your CI platform. Make it block merges. Now the agent cannot ship broken code even if it tries.

4. Architecture. This is the long game. You don't need perfect architecture to start using an agent. But every boundary you create, every interface you extract, every consistent pattern you establish makes the agent more reliable in that area.

5. Harness files. Start with a root CLAUDE.md that describes the project, the tech stack, and the top-level patterns. Add subdirectory files as you notice the agent drifting in specific areas.

6. Skills. Only after everything else is working. Skills automate a workflow that already works manually. If the underlying steps aren't solid, automating them just produces bad output faster.

The Pattern Behind the Pattern

Every step in this series followed the same logic:

Identify a category of error the agent can make.
Add a machine-checkable constraint that eliminates it.
Give the agent a single command to verify compliance.
Update the harness when a new failure mode appears.

Tests eliminate behavioral errors. Linting eliminates structural errors. Architecture eliminates design errors. CI eliminates delivery errors. The harness eliminates context errors. Skills eliminate process errors.

The tools in the table will change. New frameworks will appear. New linters will ship. New CI platforms will launch. But this progression constrain, verify, scope, automate will remain the same, because it's not about the tools. It's about narrowing the space where the agent can be wrong until the only thing left is the judgment calls that require a human.

That's the harness. Build it in whatever language you ship.

The Takeaway

The strategy is language-agnostic. Tests, linting, architecture, CI, harness, skills — every stack has equivalents. The progression is what matters.
Start with tests and linting. These two steps alone eliminate more bad agent output than any amount of prompt engineering.
Architecture is a force multiplier. Clear patterns make the agent predictable. Unclear patterns make it creative. You don't want creative.
The harness content is yours. The table gives you the tools, but the rules inside your harness files come from your engineering judgment about your codebase.
Constrain, verify, scope, automate. That's the whole series in four words.

The agent didn't get smarter across the previous ten posts. The environment got smarter. That's the insight that generalizes to every stack, every language, and every team. Build the environment, and the agent follows.

DEV Community

Beyond Laravel: Applying the Agent Harness to Any Stack

The Strategy Is the Point

The Seven Steps

Step 1: Test Infrastructure

Step 2: Linting and Static Analysis

Step 3: Architecture and Boundaries

Step 4: Explicit Patterns for Business Logic

Step 5: Migration Strategy (If Applicable)

Step 6: Trunk-Based Development and CI/CD

Step 7: The Harness and Skills

The Stack Table

Test Infrastructure

Linting and Static Analysis

Architecture Patterns

CI/CD and Delivery

Harness and Skills

Where to Start

The Pattern Behind the Pattern

The Takeaway

Top comments (0)