- Book: Decoupled PHP — Clean and Hexagonal Architecture for Applications That Outlive the Framework
- Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
- My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
- Me: xgabriel.com | GitHub
Picture the shape of bug that escapes a green pipeline. A team ships a refactor of a pricing module. The coverage report says 94%. CI is green. Days after release, a customer gets charged the wrong amount on a multi-pack discount that was supposed to drop the total by a clean percentage.
Then someone pulls up the test that was supposed to cover the discount math, and it ends like this:
public function testDiscountAppliesToMultiPack(): void
{
$cart = new Cart();
$cart->addItem(new Item('sku-1', 3, 1000));
$price = $this->pricer->price($cart);
$this->assertNotNull($price);
}
The test ran the code. Coverage counted the lines. The assertion checked that the result was not null. Nothing checked the actual number. Months of changes had slipped past it.
Coverage tells you which lines ran. It does not tell you which lines were verified. Mutation testing closes that gap. It changes your code in small, deliberate ways and reruns the tests. Anything that still passes is a test that wasn't really testing.
Infection on PHP 8.3, a real surviving mutant, and the reason mutation testing isn't a vanity metric.
What mutation testing actually does
Infection reads your source, applies small code mutations (a > becomes >=, a + becomes -, a return $value becomes return null), and runs your test suite against each mutated version. There are three possible outcomes per mutation:
- Killed. At least one test failed. Good — your tests noticed the bug.
- Escaped. All tests passed against the mutated code. Bad — your tests are blind to that behavior.
- Timed out / errored. The mutation broke the program in some other way. Treated as killed.
The headline number is the Mutation Score Indicator (MSI): the percentage of mutations killed. A second number, Covered Code MSI, restricts the denominator to lines your tests actually executed, so it isolates "tests that look at the code but do not really test it" from "tests that do not touch the code at all."
The two numbers tell different stories. Low coverage MSI means whole files are untested. If covered MSI is low, the tests that exist are weak. The second one is what bites you in production, because it hides under a green coverage badge.
Setting up Infection on PHP 8.3
Install it as a dev dependency. Infection 0.29+ works with PHP 8.2 and 8.3, and pairs cleanly with PHPUnit 10/11.
composer require --dev infection/infection:^0.29
Drop a configuration file at the project root:
{
"$schema": "vendor/infection/infection/resources/schema.json",
"source": {
"directories": [
"src/Domain",
"src/Application"
]
},
"timeout": 10,
"logs": {
"text": "var/infection.log",
"html": "var/infection.html",
"summary": "var/infection-summary.log"
},
"mutators": {
"@default": true
},
"testFramework": "phpunit",
"minMsi": 75,
"minCoveredMsi": 90
}
Three things in this config matter for how you read the result later.
The source.directories block is the architecture lever. You point Infection at the layers where business behavior lives (Domain and Application in a hexagonal layout). You do not mutate Infrastructure, controllers, or framework glue. Mutating an Eloquent model's $fillable array or a Symfony controller's response code produces noise. The domain is the part worth testing hard, and Infection's source filter is how you encode that intent.
The minMsi and minCoveredMsi thresholds are the CI gate. If the score drops below the threshold, Infection exits non-zero. You wire that into your pipeline the same way you wire PHPStan or CS-Fixer.
Run it:
vendor/bin/infection --threads=8 --show-mutations
The --threads flag matters more than people think. Mutation runs are embarrassingly parallel and the wall-clock cost without threading is what makes teams write mutation testing off as "too slow." On a domain folder of a few thousand lines, eight threads brings a full run under two minutes on a modern laptop.
A surviving mutant: what the report actually shows you
Here is a small domain class. It calculates a tiered discount on a cart subtotal.
<?php
declare(strict_types=1);
namespace App\Domain\Pricing;
final class TieredDiscount
{
public function applyTo(int $subtotalCents): int
{
if ($subtotalCents > 10_000) {
return (int) ($subtotalCents * 0.85);
}
if ($subtotalCents > 5_000) {
return (int) ($subtotalCents * 0.95);
}
return $subtotalCents;
}
}
The accompanying test, with the kind of assertions a hurried code review will wave through:
<?php
declare(strict_types=1);
namespace App\Tests\Domain\Pricing;
use App\Domain\Pricing\TieredDiscount;
use PHPUnit\Framework\TestCase;
final class TieredDiscountTest extends TestCase
{
public function testAppliesADiscount(): void
{
$discount = new TieredDiscount();
$result = $discount->applyTo(20_000);
$this->assertLessThan(20_000, $result);
}
public function testReturnsAnInteger(): void
{
$discount = new TieredDiscount();
$result = $discount->applyTo(7_500);
$this->assertIsInt($result);
}
}
PHPUnit reports 100% line coverage for TieredDiscount. Every branch executes at least once across the two tests.
Run Infection:
Mutation Score Indicator (MSI): 75%
Mutations: 8
Killed: 6
Escaped: 2
Errors: 0
Timeouts: 0
Two mutants escaped. Pull up the log:
1) src/Domain/Pricing/TieredDiscount.php:11 [M] GreaterThan
--- Original
+++ New
- if ($subtotalCents > 10_000) {
+ if ($subtotalCents >= 10_000) {
Infection changed > to >=. The boundary at exactly 10,000 cents now falls into the 15% discount tier instead of the 5% tier. Neither test passes anything close to 10,000 cents (one uses 20,000, the other 7,500), so the change goes unnoticed. The boundary is silent.
2) src/Domain/Pricing/TieredDiscount.php:15 [M] FloatNegation
--- Original
+++ New
- return (int) ($subtotalCents * 0.95);
+ return (int) ($subtotalCents * -0.95);
The discount factor flipped to negative. A 7,500-cent subtotal now returns -7,125 cents, a refund disguised as a charge. The test asserts the result is an integer. -7,125 is an integer. The mutant survives.
Two escaped mutants. Two real bugs that production would have caught for you, with customers as your test fixtures.
The fix is to write the tests the assertions were pretending to be:
public function testTopTierDiscount(): void
{
$discount = new TieredDiscount();
$this->assertSame(17_000, $discount->applyTo(20_000));
}
public function testMidTierDiscount(): void
{
$discount = new TieredDiscount();
$this->assertSame(7_125, $discount->applyTo(7_500));
}
public function testTopTierBoundaryExclusive(): void
{
$discount = new TieredDiscount();
$this->assertSame(9_500, $discount->applyTo(10_000));
}
public function testMidTierBoundaryExclusive(): void
{
$discount = new TieredDiscount();
$this->assertSame(5_000, $discount->applyTo(5_000));
}
public function testNoDiscountBelowThreshold(): void
{
$discount = new TieredDiscount();
$this->assertSame(3_000, $discount->applyTo(3_000));
}
Rerun Infection. MSI: 100%. The two mutants that escaped before now die on the boundary tests and the exact-value assertions.
The old tests asked "did something happen?" The new tests ask "what is the answer?" Mutation testing forces that shift in question without you having to enforce it by hand in code review.
Why this belongs in the architecture conversation
A hexagonal codebase puts business behavior in one place and infrastructure in another. The split exists so the parts that encode the rules of the business (the parts the company actually pays for) are testable without a database, without a framework boot, without any of the slow stuff. That is the architectural promise.
Mutation testing is how you verify the promise was kept.
If you point Infection at src/Domain and the MSI is 90%, your domain tests are doing real work. If you point it at the same folder and the MSI is 40%, the architecture is a fiction. The code is structured for testability, and nobody is actually testing it.
This is why the source filter in the Infection config is not a performance tweak. It is a statement about what the codebase is for. The domain is where mutation testing earns its cost. Adapters get integration tests with real Postgres, real HTTP, real queues. Those tests run end-to-end, and a mutation framework adds little signal on top. The domain is the place where pure logic concentrates, and the only place where mutation testing is the right tool.
A pragmatic CI setup encodes that:
- name: PHPUnit
run: vendor/bin/phpunit
- name: Infection (domain layer)
run: vendor/bin/infection
--threads=8
--min-msi=85
--min-covered-msi=95
--only-covered
CLI flags override the JSON config when both are set, so the CI run lifts the thresholds above the local baseline.
Two gates. The first one says the tests must pass. The second one says the tests must mean something. A pull request that adds a new domain class without tests does not regress coverage if the file is empty of behavior, but it regresses MSI, because Infection sees an unverified line and counts it. MSI catches the empty-shell class; coverage doesn't.
What to skip and what to tune
Three pitfalls catch teams adopting Infection for the first time.
Mutating the wrong layer. If you point Infection at the whole project, you get hundreds of escaped mutants in infrastructure code: Doctrine entity mappings, HTTP response codes, container configuration. They are noise. The mutator does not know that a switched route status is meaningless in your test setup, and you do not want it to. Keep the source filter tight on domain and application.
Treating MSI as a vanity metric. A perfect 100% MSI on a thousand-line domain is achievable, expensive, and the marginal mutant rarely surfaces real bugs. The honest target for production code is around 85–90% on the domain, with the gate set at the level you actually maintain. Teams that chase 100% end up writing tests that assert the mutator can't reach a branch, not tests that catch bugs.
Running it on every commit. A full Infection run on a large domain takes minutes, not seconds. Run PHPUnit on every commit, Infection on every merge to main, and a full diff-mode run (--git-diff-filter=A, mutating only changed files) on PRs. That is the trade-off most teams settle on after a quarter of usage.
The infrastructure for tracking the score over time exists in the tool. The HTML report shows trends, escaped-mutant categories, and the specific lines that lose points. The escaped-mutant list is the new code-review prompt: instead of "are there enough tests?" the question becomes "why did this mutant escape?"
If this was useful
The hex layout in Decoupled PHP is what makes mutation testing tractable in the first place: a domain you can mutate without booting Laravel or Symfony, a port boundary that tells Infection where to stop, and a use-case layer thin enough that every escaped mutant points at a real gap. The book walks through the testing chapters in the same order this post does: contract tests at the ports, mutation tests in the center, end-to-end on the adapters.
Available on Kindle, Paperback, and Hardcover. English, German, and Japanese editions out now — Portuguese and Spanish coming soon.



Top comments (0)