The Codebase Nobody Wanted to Touch
I inherited a Laravel 9 application — a line-of-business app with complex workflows, role-based access, and third-party integrations. It worked. Users depended on it. And it had exactly zero automated tests.
Note: The code examples throughout this series have been made generic to protect the original domain, but they reflect real patterns from a production codebase. Think of them as practical, working examples you can adapt to your own project.
No tests. No linting. No CI pipeline. Just a repo, a prayer, and git push origin main.
I knew I wanted to use AI coding agents (specifically Claude Code) to accelerate development. But here's the thing about AI agents writing code on a codebase with no tests: you get slop. Fast slop. Confidently wrong slop.
An agent without tests is a junior engineer without code review. It'll produce something that looks right, compiles fine, and breaks in production at 2am.
So before I let an agent touch a single line, I wrote the tests.
Tests Are the Reins, Not the Saddle
There's a common misconception that tests are something you add after you've built the thing. A nice-to-have. A checkbox for coverage reports.
When you're working with an AI agent, tests are the most important artifact in your repository. More important than the code itself. Here's why:
Tests are the only machine-checkable specification of what your code should do.
Documentation lies. Comments rot. But a failing test is an undeniable fact. When you tell an agent "implement feature X" and it writes code that passes all existing tests plus the new ones you wrote for X — you have empirical evidence that it worked. Not vibes. Evidence.
Dave Farley's Modern Software Engineering hammers this point: optimize for fast feedback. Tests are the fastest feedback mechanism you have. They turn "does this work?" from a manual investigation into a command: make test.
Starting From Zero: Characterization Tests
You can't TDD a legacy codebase that already exists. The code is there, doing something, and users depend on that something. So you start with characterization tests.
A characterization test doesn't assert what the code should do. It asserts what the code does. You're locking in current behavior so you can refactor safely later.
For a Laravel app, this means HTTP feature tests:
/** @test */
public function admin_can_view_orders_index()
{
$admin = UserFactory::admin()->create();
$this->actingAs($admin);
$response = $this->get('/orders');
$response->assertOk();
$response->assertViewIs('orders.index');
}
Not glamorous. But now you know: if you change something in OrdersController@index and this test fails, you broke existing behavior.
I started with the highest-risk controllers — the ones that handled money, orders, and user data — and wrapped them in feature tests. I used Claude to help write the initial characterization tests. I'd point it at a controller, describe the expected behavior, and have it generate the test scaffolding. I still reviewed every test and adjusted the assertions, but Claude accelerated the tedious part — reading legacy code and translating "what does this endpoint do?" into executable assertions. It turned a week of manual work into a couple of days.
It took time. It was tedious even with help. And it was the single best investment I made in the entire project.
The Test Database Problem
The original test setup used SQLite. Quick to spin up, zero config, and subtly wrong.
SQLite doesn't enforce the same constraints as MySQL. Foreign keys behave differently. Date functions differ. Fulltext search doesn't exist. The tests were passing against a database engine that didn't match production.
I switched to MySQL 8.0 running in Docker with tmpfs (RAM-backed storage) for speed. The tests now ran against the same engine as production. Some tests broke. Good! Those were the tests that were lying to me.
The UserFactory Facade
Every test needs users with specific roles and permissions. Laravel's built-in model factories are fine for simple cases, but this app has a complex role/permission system. Creating a user with the right role, permissions, organization, and relationships was 10+ lines of setup code per test.
So I built a test UserFactory with a fluent API:
use Facades\Tests\Setup\UserFactory;
// An admin user, fully configured
$admin = UserFactory::admin()->create();
// A regular user with specific permissions
$user = UserFactory::withPermissions('orders.manage')->create();
// An org admin in a specific organization
$orgAdmin = UserFactory::orgAdmin()
->withOrganization($org)
->create();
This did two things:
- Made tests readable — you can scan the Arrange section and immediately know who's doing what
- Made it trivial for the agent to write test setup — the API is discoverable and hard to misuse
That second point matters more than you'd think. When Claude writes a test, it needs to create users. If the correct way to do that is buried in 10 lines of factory states and role assignments, the agent will guess wrong. If it's UserFactory::admin()->create(), the agent gets it right every time.
Design your test infrastructure for the dumbest correct user — because that's how an agent will use it.
TDD as a Communication Protocol
Here's where it gets interesting. Once I had a test suite I trusted, I started using TDD not just as a development practice, but as a communication protocol with Claude.
The workflow:
- I write the failing tests. This is me telling the agent exactly what I want, in a language that's unambiguous.
- Claude implements the smallest change to make the tests pass.
- I review the implementation.
The tests are the spec. They're not prose that the agent might misinterpret. They're executable assertions. When I write:
/** @test */
public function user_cannot_approve_their_own_order()
{
$user = UserFactory::withPermissions('orders.manage')->create();
$order = Order::factory()->pending()->for($user)->create();
$this->be($user, 'sanctum');
$this->postJson("/api/orders/{$order->id}/approve")
->assertForbidden();
}
There's no ambiguity. The agent knows exactly what "users can't approve their own orders" means in terms of HTTP status codes, routes, and authorization rules.
This is the key insight from this entire series: tests aren't just quality assurance. They're the primary interface between you and your AI agent.
The Numbers
Starting from zero, I built up to 2,700+ PHP tests and a growing Vitest suite for the React frontend. The full suite takes 10–20 minutes (MySQL is thorough, not fast). But that investment bought me something no amount of code review could:
Confidence.
Confidence that when Claude makes a change, I can verify it works with a single command. Confidence that refactoring won't break things. Confidence that the next 200 commits will land safely.
And that confidence is what makes everything in the rest of this series possible.
What This Looks Like in Practice
Here's the Makefile target that runs the full suite:
test:
docker compose exec app php artisan test $(ARGS)
Simple. One command. Every developer (and every agent) uses the same command. No special setup, no "works on my machine."
The test configuration lives in .env.testing, the database runs in Docker, and the entire thing is reproducible on any machine or in CI. If it passes locally, it passes in the pipeline.
The Takeaway
If you're thinking about using AI agents on your codebase, do this first:
-
Get your tests running against a real database — not
SQLite, not mocks, the actual engine you run in production. - Create a test user factory with a simple, fluent API that makes it obvious how to create users with the right roles.
- Wrap your highest-risk code in characterization tests — lock in current behavior before you change anything.
-
Make
make testwork — one command, zero configuration, same result every time.
You can't trust an agent you can't verify. Tests are how you verify.
Top comments (0)