The Agent Writes Code. Who Checks It?
In the last post, I talked about why tests come first when working with an AI agent. Tests tell you if the code works. But they don't tell you if the code is good.
An agent will happily write code that passes all your tests and is also:
- Inconsistently formatted
- Full of type errors Psalm would catch
- Using deprecated patterns
- Missing semicolons in one file and using them in another
Tests catch behavioral bugs. Linting catches structural rot. You need both.
The Tooling Stack
I added five tools to this Laravel + React codebase, and each one closed a different gap:
| Tool | Language | What It Catches |
|---|---|---|
| Pint | PHP | Code style (PSR-12, Laravel conventions) |
| Psalm | PHP | Static analysis (type errors, null safety, dead code) |
| Prettier | JS/CSS/Blade | Formatting (consistent whitespace, quotes, line length) |
| ESLint | TypeScript/React | Lint rules (unused vars, hook deps, accessibility) |
| TypeScript | TypeScript | Type checking (compile-time type safety) |
One make lint command runs all five:
lint: pint psalm format eslint typecheck
If any of them fail, the change doesn't ship. Period.
Why Agents Need Checkable Standards
Here's the fundamental problem with giving an AI agent a style guide:
"Please use consistent formatting and follow our coding conventions."
That's a suggestion. The agent might follow it. It might not. You'll spend your review time catching style issues instead of reviewing logic.
Now compare:
$ make lint
Pint ........... FAIL (3 files)
Psalm .......... PASS
Prettier ....... FAIL (1 file)
ESLint ......... PASS
TypeScript ..... PASS
That's a fact. The agent can run make lint, see it failed, fix the issues, and run it again. No ambiguity. No judgment call. Pass or fail.
Prose guides are for humans. Machine-checkable standards are for agents.
This is the same principle behind CI/CD: don't rely on people to remember the rules. Encode the rules into tools that enforce them automatically.
Pre-Commit Hooks: The First Gate
I added a pre-commit hook early in the project. It runs a subset of checks before any commit lands:
This catches the most common issues before they even hit CI. When Claude generates code that's formatted wrong, the pre-commit hook blocks the commit. Claude sees the failure, fixes the formatting, and tries again. Zero human intervention.
The Compound Effect
Each tool on its own catches a category of problems. Together, they create something more powerful: a narrowing of the failure space.
Without any tools, the agent can produce code that's wrong in infinite ways — wrong behavior, wrong types, wrong format, wrong style, wrong patterns.
Add tests: now the behavior is constrained.
Add Pint: now the PHP style is constrained.
Add Psalm: now the types are constrained.
Add Prettier: now the JS formatting is constrained.
Add ESLint: now the React patterns are constrained.
Add TypeScript: now the frontend types are constrained.
Each layer removes an entire category of "wrong." What's left is a much smaller space of valid code. The agent's job gets easier because there are fewer ways to be wrong.
Think of it like bowling bumpers. Each tool is a bumper. The ball (the agent's code) can still miss the pins, but it can't end up in the gutter.
CI as the Final Gate
Pre-commit hooks are great, but they can be bypassed (accidentally or intentionally). CI is the gate that can't be skipped.
The GitHub Actions pipeline runs the full check suite on every push:
The pipeline has four stages:
- Build — Docker images pushed to GitHub Container Registry
-
Code Quality —
make lint(Pint, Psalm, Prettier, ESLint, TypeScript) -
Tests —
make test(2,700+ PHP tests),make test-js(Vitest) - Deploy — Deployment webhook (only on main, only if everything passes)
Nothing merges to main without passing all four stages. This is the same pipeline for human-written code and agent-written code. No exceptions.
Infrastructure: Docker All the Way Down
One thing I want to call out — all of this runs in Docker. Every make target executes inside the Docker app container. The Makefile is the interface:
pint:
docker compose exec app ./vendor/bin/pint --test
psalm:
docker compose exec app ./vendor/bin/psalm
format:
docker compose exec app npx prettier --check "resources/**/*.{js,ts,tsx,css,scss,blade.php}"
eslint:
docker compose exec app npx eslint "resources/js/**/*.{ts,tsx}"
typecheck:
docker compose exec app npx tsc --noEmit
This matters because:
- Reproducibility — same PHP version, same Node version, same everything, everywhere
- No "works on my machine" — if it passes in Docker locally, it passes in CI
-
The agent doesn't need to know about local setup — it just runs
make lint
The Docker stack itself is Ubuntu 24.04 LTS with PHP-FPM, Nginx, MySQL 8.0, and optional Redis + queue worker containers. Everything defined in docker-compose.yml, everything started with make up.
The Queue and Background Jobs
The app processes background jobs — background calculations, CRM syncing, notification dispatch. These run through Laravel's queue system backed by Redis:
# docker-compose.yml (queue profile)
redis:
image: redis:7-alpine
profiles: [queue]
queue-worker:
build: .
command: php artisan queue:work redis --queue=default --sleep=3 --tries=3
profiles: [queue]
depends_on:
- redis
- mysql
In tests, QUEUE_CONNECTION=sync runs jobs synchronously so tests don't depend on Redis. In CI, same thing. In production, Redis handles the real work.
The point: infrastructure decisions like queue drivers, cache drivers, and session drivers all have test-mode equivalents. Getting these right early means the agent never has to think about them.
What I Learned
Add linting before you start feature work. Every feature you build without linting is a feature you'll have to retroactively lint later.
Make the fix commands obvious. If
make lintfails, the error message should tell you to runmake pint-fixormake format-fix. The agent reads these messages and acts on them.Run everything in Docker. The consistency is worth the overhead. You never debug environment differences again.
CI is not optional. It's the only gate you can trust completely. Pre-commit hooks help, but CI enforces.
Each tool is a force multiplier. Pint alone doesn't transform your workflow. But Pint + Psalm + Prettier + ESLint + TypeScript + tests + CI + pre-commit hooks? That's a system. And systems compound.
The Emerging Pattern
Notice what's happening here. We're not building features yet. We're building the ability to build features safely.
Tests verify behavior. Linting verifies structure. CI verifies both, automatically, on every push. Docker makes it reproducible. The Makefile makes it accessible.
This is the foundation. In the next post, we'll start refactoring, extracting traits into services, pulling logic into Actions - and every change will be validated by this exact system.
When the agent writes a refactoring commit, it runs make lint and make test. If both pass, the refactoring preserved behavior and maintained code quality. That's not a guess. That's proof.
Top comments (0)