Gabriel Anhaia

Posted on May 19

Property-Based Testing for Domain Rules in PHP

#php #testing #ddd #architecture

Book: Decoupled PHP — Clean and Hexagonal Architecture for Applications That Outlive the Framework
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

You write a Money value object. You write a test. The test does this:

public function testAddTwoEuros(): void
{
    $a = Money::of(100, 'EUR');
    $b = Money::of(200, 'EUR');

    self::assertSame(300, $a->add($b)->amount());
}

It passes. You ship it. Six months later, a colleague refactors add() to short-circuit when one side is zero, and the test still passes. Then production starts producing receipts where 0 EUR + 50 EUR = 0 EUR, because the early-return path returns $this without copying the other side's amount.

The test wasn't wrong. It was insufficient. You checked the one pair of numbers you happened to type. You never checked that adding zero is identity. You never checked that the order of operands does not matter. You never checked that a thousand random pairs round-trip the way Money claims they do.

That gap is what property-based testing fills. Instead of writing one assertion against one fixture, you write a property (a statement that must hold for every valid input), and the framework throws hundreds of generated inputs at it. When it finds a failure, it shrinks the input down to the minimal example that still breaks the property.

This post walks through wiring Eris into a PHPUnit 11 project, writing property tests for two real domain invariants (Money::add commutativity and OrderTotal monotonicity), and a concrete refactor where a property test catches a bug example-based tests had missed for a year.

Why example-based testing runs out of room

Example-based tests have one job: pin the behavior you remembered to think about. They are great at documenting intent and catching regressions on known cases. They are terrible at finding the cases you never thought of.

A domain rule like adding money is commutative is not really a statement about 100 + 200. It is a statement about every pair of Money values in the same currency. Writing one example per pair is impossible. Writing the rule once and asking the framework to try a thousand pairs is the obvious move.

Three categories of bug live in the gap between examples:

Edge values. Zero, negative numbers, PHP_INT_MAX, currencies your team doesn't ship to yet.
Combinations. A method that works for (a, b) but breaks for (b, a). A reducer that works on lists of length 2 but breaks on length 1 or empty.
Round-trips. Serialize then deserialize. Encode then decode. Persist then load. The example-based test uses the same fixture for both halves and the bug stays hidden because the round-trip never sees a value the team didn't already think of.

Property-based testing catches all three. The framework isn't clever. You are just forced to write down the rule instead of a sample of it.

Wiring Eris into PHPUnit 11

Eris is the property-based testing library for PHP. It pre-dates PHPUnit 10's attribute system, so the integration takes a small adapter. PHPUnit 11 keeps the same approach.

Install:

composer require --dev giorgiosironi/eris

A property test in Eris reads like a sentence. You declare the generators (the shape of the inputs), then a then block that runs once per generated case:

<?php

declare(strict_types=1);

namespace Tests\Domain;

use Eris\Generator;
use Eris\TestTrait;
use PHPUnit\Framework\TestCase;

final class MoneyPropertyTest extends TestCase
{
    use TestTrait;

    public function testAdditionIsCommutative(): void
    {
        $this
            ->forAll(
                Generator\int(),
                Generator\int(),
            )
            ->then(function (int $a, int $b): void {
                $left  = Money::of($a, 'EUR')
                    ->add(Money::of($b, 'EUR'));
                $right = Money::of($b, 'EUR')
                    ->add(Money::of($a, 'EUR'));

                self::assertSame(
                    $left->amount(),
                    $right->amount(),
                );
            });
    }
}

This single test runs add() against 100 random (int, int) pairs by default. If any pair fails the equality, Eris shrinks the failing input: it tries smaller numbers around the failure and reports the minimal counterexample, not the random one that happened to trip first. A shrunk failure looks like:

There was 1 failure:

1) Tests\Domain\MoneyPropertyTest::testAdditionIsCommutative
Failed asserting that 0 is identical to 50.

Eris\Quantifier\ForAll: shrunk to:
  int(0), int(50)

The shrinker takes the gibberish input the random generator found (something like int(-481923), int(50)) and walks it down to the smallest pair that still breaks the rule. Property-based testing does more than find a bug. It finds the smallest example of the bug.

Property 1: `Money::add` commutativity and associativity

Three properties cover almost everything you want from integer-backed money arithmetic in the same currency:

Identity. a + 0 = a.
Commutativity. a + b = b + a.
Associativity. (a + b) + c = a + (b + c).

Written out:

public function testAdditionIdentity(): void
{
    $this
        ->forAll(Generator\int())
        ->then(function (int $a): void {
            $money = Money::of($a, 'EUR');
            $zero  = Money::of(0, 'EUR');

            self::assertSame(
                $a,
                $money->add($zero)->amount(),
            );
            self::assertSame(
                $a,
                $zero->add($money)->amount(),
            );
        });
}

public function testAdditionIsAssociative(): void
{
    $this
        ->forAll(
            Generator\int(),
            Generator\int(),
            Generator\int(),
        )
        ->then(function (int $a, int $b, int $c): void {
            $eur = fn (int $n): Money => Money::of($n, 'EUR');

            $leftFirst  = $eur($a)->add($eur($b))->add($eur($c));
            $rightFirst = $eur($a)->add($eur($b)->add($eur($c)));

            self::assertSame(
                $leftFirst->amount(),
                $rightFirst->amount(),
            );
        });
}

Now add the currency-mismatch invariant. Money should refuse to add EUR to USD, and property-based testing makes that easy too:

public function testCurrencyMismatchAlwaysThrows(): void
{
    $currencies = ['EUR', 'USD', 'GBP', 'JPY', 'BRL'];

    $this
        ->forAll(
            Generator\int(),
            Generator\int(),
            Generator\elements($currencies),
            Generator\elements($currencies),
        )
        ->when(fn ($a, $b, $c1, $c2): bool => $c1 !== $c2)
        ->then(function (
            int $a,
            int $b,
            string $c1,
            string $c2,
        ): void {
            $this->expectException(CurrencyMismatch::class);
            Money::of($a, $c1)->add(Money::of($b, $c2));
        });
}

The when() clause is a precondition: Eris discards generated cases that don't satisfy it and keeps drawing until it has enough. You read the test as for every pair of amounts and any two different currencies, addition must throw.

The Money class under test is plain PHP 8.3:

<?php

declare(strict_types=1);

namespace App\Domain;

final readonly class Money
{
    private function __construct(
        public int $amountMinor,
        public string $currency,
    ) {
    }

    public static function of(int $amount, string $currency): self
    {
        return new self($amount, $currency);
    }

    public function add(Money $other): Money
    {
        if ($this->currency !== $other->currency) {
            throw new CurrencyMismatch(
                $this->currency,
                $other->currency,
            );
        }

        return new self(
            $this->amountMinor + $other->amountMinor,
            $this->currency,
        );
    }

    public function amount(): int
    {
        return $this->amountMinor;
    }
}

Run the suite. All three properties pass on 100 cases each, all currency-mismatch combinations throw. You have not written 300 fixture-based tests; you have written four rules.

Property 2: `OrderTotal` is monotonic in line count

The second invariant is from a real domain rule that example-based tests usually miss. An Order accumulates line items. Each line has a non-negative subtotal (price × quantity). Therefore the order total must be monotonic in the number of lines: adding a line can never make the total smaller.

If your code ever violates this, something is wrong: a refund, a signed quantity, a discount that overflowed, a coupon that ran twice. The property says the rule out loud:

public function testOrderTotalIsMonotonic(): void
{
    $this
        ->forAll(
            Generator\seq(
                Generator\tuple(
                    Generator\choose(0, 10_000),
                    Generator\choose(1, 50),
                ),
            ),
            Generator\tuple(
                Generator\choose(0, 10_000),
                Generator\choose(1, 50),
            ),
        )
        ->then(function (array $existing, array $extra): void {
            $order = new Order();
            foreach ($existing as [$price, $qty]) {
                $order = $order->withLine(
                    new Line($price, $qty),
                );
            }
            $before = $order->total();

            [$price, $qty] = $extra;
            $after = $order
                ->withLine(new Line($price, $qty))
                ->total();

            self::assertGreaterThanOrEqual(
                $before,
                $after,
                'Adding a line never decreases the total',
            );
        });
}

Generator\seq() produces a list of zero-or-more tuples. Generator\choose($min, $max) clamps the integer range. The result covers every shape of order from empty up to dozens of lines, with prices and quantities anywhere in the realistic range, plus one extra line on top.

If withLine() is correct, the test passes silently. If someone introduces a discount line with a negative subtotal, the shrinker finds it and reports the smallest order that triggers the regression.

The refactor where the property test earned its keep

Here is Order::withLine as it was in the example-based version:

public function withLine(Line $line): self
{
    $clone = clone $this;
    $clone->lines[] = $line;
    return $clone;
}

public function total(): int
{
    return array_sum(
        array_map(
            fn (Line $l): int => $l->price * $l->quantity,
            $this->lines,
        ),
    );
}

A teammate opens a PR to support free-line items. A line where price is zero is fine, but they also want to support a discount line (a line whose subtotal is subtracted from the running total instead of added). They wire it in:

public function total(): int
{
    $sum = 0;
    foreach ($this->lines as $line) {
        $sum += $line->isDiscount
            ? -1 * $line->price * $line->quantity
            : $line->price * $line->quantity;
    }
    return $sum;
}

The example-based tests still pass. They never used a discount line, so the new branch is exercised only by the two new tests the PR author wrote, both of which use a non-discount baseline and a discount on top. Both pass.

The property test fails on the first run:

1) Tests\Domain\OrderPropertyTest::testOrderTotalIsMonotonic
Failed asserting that -200 is greater than or equal to 0.

shrunk to:
  existing: []
  extra:    [200, 1]   (with isDiscount=true)

The shrunken counterexample is unambiguous: an empty order, then one discount line of (200, 1), produces a total of -200. The monotonicity invariant is broken. The PR author has two ways out:

Decide that discount lines should clamp at the running total (max(0, sum - discount)).
Decide that the invariant is wrong, that orders are allowed to go negative, and update the property test (and the spec it documents) to say assertGreaterThanOrEqual(-MAX_DISCOUNT, $after).

Either resolution is fine. The point is that the property test forced the conversation to happen before the PR merged, by stating a domain rule clearly enough that the runtime could check it. The example-based tests would have shipped the regression and let the support team find it.

When to reach for a property test, and when not to

Property-based testing earns its keep when the rule is algebraic: associativity, commutativity, identity, idempotency, monotonicity, round-trips, invariants over a state machine. It is wasted on rules that are essentially fixtures. The welcome email subject line says "Welcome" is a value, not a property, and an example test is the right shape.

A rough decision rule:

The rule is a statement about every input from some class → property test.
The rule is a statement about this input → example test.
You can describe the rule as a one-line invariant that holds across many cases → property test.
You can only describe it by enumerating cases → example test.

Most domain code in a clean-architecture PHP service has both kinds of rule. The entity invariants (Money arithmetic, OrderTotal monotonicity, Email round-trip) are properties. The use-case orchestrations (creating an order issues exactly one OrderCreated event with the right payload) are examples. Write both. The example tests document intent and read well in code review. The property tests stop the bugs you didn't think of from reaching production.

The Eris suite for a moderately complex domain runs in a couple of seconds even at 200 cases per property. The cost is negligible. The payoff is that the next PR that introduces a subtle change to Money or Order has to pass a much wider net than the team's collective memory of edge cases. Domain rules belong in code, not in the heads of whoever was on the team three years ago. Property tests are how you write them down.

If this was useful

A domain that survives framework migrations is one where the rules are encoded — in types, in invariants, and in tests that check them across more cases than any human would write by hand. The book walks the full hexagonal layout in PHP 8.3+, with the testing chapter going deeper into property tests, contract tests, and in-memory adapters across the use-case layer. If this post lined up with where your codebase hurts, the book is the long version of the same argument.

Available on Kindle, Paperback, and Hardcover. English, German, and Japanese editions out now — Portuguese and Spanish coming soon.

DEV Community

Property-Based Testing for Domain Rules in PHP

Why example-based testing runs out of room

Wiring Eris into PHPUnit 11

Property 1: `Money::add` commutativity and associativity

Property 2: `OrderTotal` is monotonic in line count

The refactor where the property test earned its keep

When to reach for a property test, and when not to

If this was useful

Top comments (0)

Why example-based testing runs out of room

Wiring Eris into PHPUnit 11

Property 1: Money::add commutativity and associativity

Property 2: OrderTotal is monotonic in line count

The refactor where the property test earned its keep

When to reach for a property test, and when not to

If this was useful

Property 1: `Money::add` commutativity and associativity

Property 2: `OrderTotal` is monotonic in line count