DEV Community

Ian Johnson
Ian Johnson

Posted on

Beyond Laravel: Applying the Agent Harness to Any Stack

The Strategy Is the Point

This series followed a Laravel + React codebase. But if you've been reading for the strategy and not the syntax, you already know: none of this is Laravel-specific.

Tests before agents. Linting as machine-checkable standards. Clean architecture so the agent follows patterns instead of inventing them. Trunk-based development for fast feedback. Harness files that scope guidance to where the agent is working. Custom skills that turn your workflow into structure.

Every step has an equivalent in whatever stack you're using. The tools change. The progression doesn't.

The Seven Steps

Here's the agent harness approach distilled to its language-agnostic core. Each step builds on the ones before it. You cannot skip ahead: the entire system is load-bearing.

Step 1: Test Infrastructure

What you're doing: Wrapping the existing codebase in tests that run against real dependencies (the same database engine, the same cache, the same queue) so you have a machine-checkable safety net before the agent touches anything.

What matters:

  • Characterization tests first. Lock in what the code does before you change what it should do. These aren't aspirational tests. They're documentation of current behavior.
  • Real dependencies, not fakes. If production runs Postgres, your tests run Postgres. SQLite-in-memory is a lie that will catch up with you.
  • One command to run everything. make test, npm test, ./gradlew test — the agent needs a single entry point. If running tests requires tribal knowledge, the agent will get it wrong.
  • Test factories that are hard to misuse. Give the agent a discoverable API for creating test data. Fluent builders, factory patterns, fixtures with clear names. Design for the dumbest correct user, because that's how the agent will use it.

Step 2: Linting and Static Analysis

What you're doing: Adding machine-checkable standards for code style, type safety, and structural quality. Each tool eliminates an entire category of wrong output from the agent.

What matters:

  • Format, lint, and type-check — all three. Formatting removes style arguments. Linting catches structural problems. Type checking catches logic errors. Together they narrow the space the agent can operate in.
  • One command to check everything. make lint, npm run lint, a Makefile target that runs the full stack. The agent runs this before every commit.
  • Pre-commit hooks that block and explain. The hook should fail with a message the agent can read and act on. "Run npx prettier --write . to fix" is better than "formatting error on line 47."
  • CI as the gate that cannot be skipped. Pre-commit hooks are the first check. CI is the final one. The agent cannot merge without green CI.

Step 3: Architecture and Boundaries

What you're doing: Refactoring toward clean boundaries (interfaces, services, clear separation of concerns) so the agent can work within a bounded area without needing to understand the whole system.

What matters:

  • Contracts before implementations. Define interfaces first. The agent can implement an interface without understanding the rest of the system. It cannot safely modify a God class.
  • One responsibility per unit. Whether it's a service class, a module, a use case — the agent works best when each unit does one thing and the boundaries are obvious.
  • Architecture as documentation. If the codebase has a clear pattern (actions, services, repositories, commands), the agent follows it. If every file is a snowflake, the agent improvises. You don't want improvisation.
  • Small, safe steps. One extraction per PR. Keep the app running in production throughout. Never refactor and change behavior in the same commit.

Step 4: Explicit Patterns for Business Logic

What you're doing: Establishing the patterns the agent should follow for new work: how business logic is structured, how authorization works, and how data flows through the system.

What matters:

  • A single pattern for business logic. Actions, use cases, commands, interactors — the name doesn't matter. What matters is that there's one pattern, it's consistent, and the agent can see ten examples in the codebase.
  • Centralized authorization. Scattered permission checks are a security risk with human developers. With an agent, they're a guarantee of inconsistency. Use your framework's policy/guard/permission system.
  • Typed inputs and outputs. Form objects, request validators, result types, DTOs — whatever your stack calls them. The agent needs to know what goes in and what comes out.

Step 5: Migration Strategy (If Applicable)

What you're doing: If you're migrating frontends, databases, or major subsystems: running old and new in parallel, migrating incrementally, never doing a big-bang rewrite, etc.

What matters:

  • Both systems run simultaneously. The old system serves production. The new system is gated behind environment flags or feature toggles until proven.
  • Page by page, feature by feature. Each migration is a small PR. Each small PR goes through the full test/lint/CI pipeline.
  • Clear scoping rules. The agent needs to know: does this work go in the old system, the new system, or both? Make the rules explicit in the harness.

Step 6: Trunk-Based Development and CI/CD

What you're doing: Establishing the delivery cadence that makes AI-assisted development practical: short-lived branches, small PRs, fast CI, and automated deployment.

What matters:

  • Branches live for hours, not days. The longer a branch lives, the more the agent's assumptions go stale. Small batches, fast merges.
  • CI runs the full pipeline. Build, lint, type-check, test, deploy. If any step fails, the PR doesn't merge.
  • Conventional commits. A machine-readable commit history helps the agent understand what changed and why. It also helps you when you're reviewing 145 PRs in three months.
  • Automated deployment. Push to main, deploy to staging. The feedback loop from code change to running software should be minutes, not hours.

Step 7: The Harness and Skills

What you're doing: Writing scoped guidance files that tell the agent how to work in each area of the codebase, then codifying your workflow into repeatable skills.

What matters:

  • Scoped guidance, not one big file. One harness file per major area. The agent loads what's relevant to where it's working. Keep the signal-to-noise ratio high.
  • Patterns with examples, not just rules. Show the agent a code example of the pattern you want. "Do it like this" beats "follow these principles" every time.
  • Anti-patterns are explicit fences. Tell the agent what not to do. "Never put HTTP concerns in an Action" is more useful than "keep Actions pure."
  • The feedback protocol. When the agent drifts, ask: is this a harness gap? If yes, update the harness first, then re-apply. Corrections become permanent rules.
  • Skills codify the sequence. Automate the ceremony (read ticket, write tests, implement, lint, commit, push, PR). Keep the judgment calls at checkpoints.

The Stack Table

Here's how each step maps to tools across popular web framework stacks. The rows are the steps. The columns are the stacks. Every cell answers: "What would I use here?"

Test Infrastructure

Runner DB Strategy Factories / Fixtures One Command
Laravel (PHP) PHPUnit / Pest MySQL in Docker (tmpfs) Model Factories php artisan test
Rails (Ruby) RSpec / Minitest Postgres in Docker FactoryBot bundle exec rspec
Django (Python) pytest-django Postgres in Docker factory_boy / Model Bakery python -m pytest
Next.js (TypeScript) Vitest / Jest Postgres via Testcontainers Prisma seed scripts / custom builders npm test
Spring Boot (Java) JUnit 5 Testcontainers (Postgres/MySQL) TestEntityManager / custom builders ./gradlew test
ASP.NET (C#) xUnit / NUnit Testcontainers or LocalDB Bogus + custom builders dotnet test
Go testing + testify Testcontainers or dockertest Custom factory functions go test ./...
Phoenix (Elixir) ExUnit Postgres sandbox ex_machina mix test

Recommendation: Wrap your test command in a make test target. It gives the agent (and your team) a single, stack-agnostic entry point that hides flags, environment setup, and Docker orchestration behind one command. When every project starts with make test, nobody needs to remember whether it's php artisan test, go test ./..., or bundle exec rspec.

Linting and Static Analysis

Formatter Linter Type Checker One Command
Laravel (PHP) Pint Psalm / PHPStan Psalm (level) ./vendor/bin/pint --test && ./vendor/bin/phpstan
Rails (Ruby) RuboCop (formatting) RuboCop (style/lint) Sorbet / Steep bundle exec rubocop
Django (Python) Black / Ruff format Ruff / Flake8 mypy / pyright ruff check . && mypy .
Next.js (TypeScript) Prettier ESLint TypeScript (tsc --noEmit) npm run lint && npx tsc --noEmit
Spring Boot (Java) google-java-format / Spotless Checkstyle / SpotBugs javac (compile-time) ./gradlew check
ASP.NET (C#) dotnet format Roslyn analyzers / StyleCop C# compiler + nullable refs dotnet format --verify-no-changes
Go gofmt / goimports golangci-lint Go compiler golangci-lint run
Phoenix (Elixir) mix format Credo Dialyxir mix format --check-formatted && mix credo && mix dialyzer

Recommendation: Wrap your lint pipeline in a make lint target. Most stacks need multiple tools chained together — formatter, linter, type checker — and the exact flags change over time. A make lint target keeps the agent from needing to know whether your project runs ruff check . && mypy . or golangci-lint run. One target, full coverage, zero tribal knowledge.

Architecture Patterns

Service Layer Business Logic Unit Authorization Request Validation
Laravel (PHP) Service classes + contracts Action classes Policies Form Requests
Rails (Ruby) Service objects / POROs Command / Interactor Pundit / Action Policy Strong Parameters + dry-validation
Django (Python) Service layer (manual) Service functions / Command pattern django-rules / permissions Serializers / Pydantic
Next.js (TypeScript) Server actions / service modules Use case functions Middleware + CASL / next-auth Zod schemas
Spring Boot (Java) @Service classes @Service or Command pattern Spring Security + @PreAuthorize @Valid + Bean Validation
ASP.NET (C#) Service classes via DI MediatR handlers / Command pattern Authorization policies + [Authorize] FluentValidation
Go Package-level service structs Handler / Use case functions Middleware + Casbin Struct validation (go-playground)
Phoenix (Elixir) Context modules Context functions / Command pattern Bodyguard Ecto changesets

CI/CD and Delivery

CI Platform Deploy Tool Branch Strategy Commit Convention
Laravel (PHP) GitHub Actions Forge / Envoyer Trunk-based, short-lived branches Conventional Commits
Rails (Ruby) GitHub Actions / CircleCI Kamal / Capistrano / Heroku Trunk-based, short-lived branches Conventional Commits
Django (Python) GitHub Actions / GitLab CI Gunicorn + systemd / Docker + ECS Trunk-based, short-lived branches Conventional Commits
Next.js (TypeScript) GitHub Actions / Vercel CI Vercel / Docker + ECS Trunk-based, short-lived branches Conventional Commits
Spring Boot (Java) GitHub Actions / Jenkins Docker + Kubernetes / AWS ECS Trunk-based, short-lived branches Conventional Commits
ASP.NET (C#) GitHub Actions / Azure DevOps Azure App Service / Docker + ECS Trunk-based, short-lived branches Conventional Commits
Go GitHub Actions Docker + Kubernetes / systemd Trunk-based, short-lived branches Conventional Commits
Phoenix (Elixir) GitHub Actions Fly.io / Docker + release Trunk-based, short-lived branches Conventional Commits

Harness and Skills

Harness Format Scoped Files Skill Definition Agent Tool
All Stacks Markdown (CLAUDE.md) One per major directory .claude/skills/SKILL.md Claude Code

The harness and skills layer is entirely stack-agnostic. It's Markdown files in your repo. The content changes (your patterns, your anti-patterns, your architectural rules) but the mechanism is the same regardless of language.

Where to Start

If you're looking at this table and wondering where to begin, here's the priority order:

1. Tests. If you have nothing else, start here. Get a test runner working against real dependencies with a single command. Write characterization tests for the most critical paths. This alone makes the agent dramatically safer.

2. Linting. Add a formatter and a linter. Wire them into a pre-commit hook. This takes an afternoon and eliminates an entire category of bad output.

3. CI. Connect your test and lint commands to your CI platform. Make it block merges. Now the agent cannot ship broken code even if it tries.

4. Architecture. This is the long game. You don't need perfect architecture to start using an agent. But every boundary you create, every interface you extract, every consistent pattern you establish makes the agent more reliable in that area.

5. Harness files. Start with a root CLAUDE.md that describes the project, the tech stack, and the top-level patterns. Add subdirectory files as you notice the agent drifting in specific areas.

6. Skills. Only after everything else is working. Skills automate a workflow that already works manually. If the underlying steps aren't solid, automating them just produces bad output faster.

The Pattern Behind the Pattern

Every step in this series followed the same logic:

  1. Identify a category of error the agent can make.
  2. Add a machine-checkable constraint that eliminates it.
  3. Give the agent a single command to verify compliance.
  4. Update the harness when a new failure mode appears.

Tests eliminate behavioral errors. Linting eliminates structural errors. Architecture eliminates design errors. CI eliminates delivery errors. The harness eliminates context errors. Skills eliminate process errors.

The tools in the table will change. New frameworks will appear. New linters will ship. New CI platforms will launch. But this progression constrain, verify, scope, automate will remain the same, because it's not about the tools. It's about narrowing the space where the agent can be wrong until the only thing left is the judgment calls that require a human.

That's the harness. Build it in whatever language you ship.

The Takeaway

  1. The strategy is language-agnostic. Tests, linting, architecture, CI, harness, skills — every stack has equivalents. The progression is what matters.
  2. Start with tests and linting. These two steps alone eliminate more bad agent output than any amount of prompt engineering.
  3. Architecture is a force multiplier. Clear patterns make the agent predictable. Unclear patterns make it creative. You don't want creative.
  4. The harness content is yours. The table gives you the tools, but the rules inside your harness files come from your engineering judgment about your codebase.
  5. Constrain, verify, scope, automate. That's the whole series in four words.

The agent didn't get smarter across the previous ten posts. The environment got smarter. That's the insight that generalizes to every stack, every language, and every team. Build the environment, and the agent follows.

Top comments (0)