DEV Community

Cover image for The Saga Pattern in PHP: Long-Running Workflows Without a Workflow Engine
Gabriel Anhaia
Gabriel Anhaia

Posted on

The Saga Pattern in PHP: Long-Running Workflows Without a Workflow Engine


You have an order endpoint that does three things across three services. It reserves inventory in the warehouse system. It charges the customer's card via the payment processor. It books a shipping label with the carrier.

The first two work. The third returns a 503. Now you have stock held under an order that will never ship and a charge on a card that needs refunding. The HTTP request that started this is long gone. The customer is still staring at a spinner.

You reach for a database transaction. It doesn't help. Two of the three calls are to other people's systems. There is no BEGIN; ROLLBACK that spans your payment processor and your warehouse partner. You reach for a queue. That helps with retries, but it does not tell you that ShipOrder failed after you already charged the card.

The pattern you want is a saga. The version most PHP shops reach for first is Temporal or Cadence, which means running a separate workflow engine next to your PHP app, picking up a new SDK, and explaining to ops why there is another stateful service to operate. You don't need that yet. A saga is a list of steps and their compensations. State fits in one Postgres table, and an orchestrator class drives it. This post builds one.

What a saga actually is

A saga is a sequence of local transactions where each step has a defined compensation. If step N fails, you run the compensations for steps 1 through N-1, in reverse order. The system ends in one of two places: the happy path completed, or every side effect has been undone.

There are two flavors:

  • Choreography: services react to each other's events. No central coordinator. Easy to start, painful to debug at three services and impossible at six.
  • Orchestration: one component owns the workflow. It calls each step, records the outcome, and decides what to compensate. Easier to reason about; the price is one more thing to operate.

For PHP services that already lean on Postgres for state, orchestration is the better starting point. An orchestrator class drives the flow, the state lives in a Postgres table, and the "engine" is whatever job worker you already run.

Orchestration saga: a sequence of steps with paired compensations, executed forward to success or rewound on failure.

The order flow has three steps:

  1. ReserveInventory: hold N units of the SKU. Compensation: release the hold.
  2. ChargePayment: capture the customer's card. Compensation: refund the capture.
  3. ShipOrder: create a shipping label. Compensation: void the label (and only the label, since the goods haven't moved).

If any step fails, the compensations for the earlier completed steps run in reverse. If a compensation fails, the saga moves to a Failed terminal state and the on-call gets paged. Compensations must be idempotent, because the worker that runs them will be retried.

The state machine, as a Postgres table

State lives in one table. Every row is a saga instance.

CREATE TABLE order_sagas (
    id UUID PRIMARY KEY,
    order_id UUID NOT NULL,
    status TEXT NOT NULL,
    current_step SMALLINT NOT NULL DEFAULT 0,
    payload JSONB NOT NULL,
    last_error TEXT,
    attempts SMALLINT NOT NULL DEFAULT 0,
    locked_until TIMESTAMPTZ,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX order_sagas_pending_idx
    ON order_sagas (status, locked_until)
    WHERE status IN ('Running', 'Compensating');
Enter fullscreen mode Exit fullscreen mode

The payload column holds the inputs to each step (SKU, quantity, customer ID, card token, shipping address) and the outputs that later steps or compensations need (the reservation ID, the charge ID, the label ID). The current_step is the index of the step the orchestrator is about to run on the next tick. locked_until is the lease: a worker that picks up the saga sets it to now() + 30s so two workers can't run the same instance at once.

The status column is small and explicit. A PHP 8.3 enum keeps it honest:

<?php
declare(strict_types=1);

namespace App\Saga\Order;

enum SagaState: string
{
    case Running = 'Running';
    case Compensating = 'Compensating';
    case Completed = 'Completed';
    case Failed = 'Failed';
}
Enter fullscreen mode Exit fullscreen mode

Running means the next forward step is pending. Compensating means a forward step failed and the orchestrator is walking backward. Completed and Failed are terminal, so workers skip them.

The orchestrator

The orchestrator does one thing: it reads the current state, runs one step (forward or compensating), records the outcome, then returns. It does not loop. The worker calls it again on the next tick. Each invocation stays short and idempotent at the boundary, which makes retries safe.

<?php
declare(strict_types=1);

namespace App\Saga\Order;

use App\Saga\Order\Steps\StepInterface;
use Throwable;

final class OrderSaga
{
    /** @var list<StepInterface> */
    private array $steps;

    public function __construct(
        private readonly OrderSagaRepository $repo,
        ReserveInventoryStep $reserve,
        ChargePaymentStep $charge,
        ShipOrderStep $ship,
    ) {
        $this->steps = [$reserve, $charge, $ship];
    }

    public function tick(string $sagaId): void
    {
        $saga = $this->repo->lockForUpdate($sagaId);

        if ($saga->isTerminal()) {
            return;
        }

        match ($saga->status) {
            SagaState::Running       => $this->advance($saga),
            SagaState::Compensating  => $this->rewind($saga),
            default                  => null,
        };
    }

    private function advance(OrderSagaRecord $saga): void
    {
        if ($saga->currentStep >= count($this->steps)) {
            $this->repo->complete($saga->id);
            return;
        }

        $step = $this->steps[$saga->currentStep];

        try {
            $output = $step->execute($saga->payload);
            $this->repo->stepSucceeded(
                $saga->id,
                $saga->currentStep + 1,
                array_merge($saga->payload, $output),
            );
        } catch (Throwable $e) {
            $this->repo->stepFailed($saga->id, $e->getMessage());
        }
    }

    private function rewind(OrderSagaRecord $saga): void
    {
        $compensateIndex = $saga->currentStep - 1;

        if ($compensateIndex < 0) {
            $this->repo->fail($saga->id);
            return;
        }

        $step = $this->steps[$compensateIndex];

        try {
            $step->compensate($saga->payload);
            $this->repo->compensationSucceeded(
                $saga->id,
                $compensateIndex,
            );
        } catch (Throwable $e) {
            $this->repo->compensationFailed(
                $saga->id,
                $e->getMessage(),
            );
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

lockForUpdate is a SELECT ... FOR UPDATE SKIP LOCKED against the lease window, so two workers cannot grab the same row. The orchestrator never sleeps and never retries inside tick; if a forward step throws, the saga flips to Compensating and the next tick picks up the compensation. The merge of $output into $payload is how the reservation ID flows from step 1 to compensation 1.

A step and its compensation

Each step is a class with two methods. Both take the saga payload and return any new state to thread forward.

<?php
declare(strict_types=1);

namespace App\Saga\Order\Steps;

interface StepInterface
{
    /**
     * @param array<string, mixed> $payload
     * @return array<string, mixed> new fields to merge into payload
     */
    public function execute(array $payload): array;

    /**
     * @param array<string, mixed> $payload
     */
    public function compensate(array $payload): void;
}
Enter fullscreen mode Exit fullscreen mode

The inventory step talks to a WarehouseClient port. The port is an interface the saga owns; the adapter is whichever HTTP client your warehouse partner makes you use. (That separation is the whole point of Decoupled PHP.)

<?php
declare(strict_types=1);

namespace App\Saga\Order\Steps;

use App\Port\WarehouseClient;
use App\Port\WarehouseException;

final class ReserveInventoryStep implements StepInterface
{
    public function __construct(
        private readonly WarehouseClient $warehouse,
    ) {}

    public function execute(array $payload): array
    {
        $reservationId = $this->warehouse->reserve(
            sku: $payload['sku'],
            quantity: (int) $payload['quantity'],
            idempotencyKey: $payload['order_id'] . ':reserve',
        );

        return ['reservation_id' => $reservationId];
    }

    public function compensate(array $payload): void
    {
        if (empty($payload['reservation_id'])) {
            return;
        }

        try {
            $this->warehouse->release(
                reservationId: $payload['reservation_id'],
                idempotencyKey: $payload['order_id'] . ':release',
            );
        } catch (WarehouseException $e) {
            if ($e->isNotFound()) {
                return;
            }
            throw $e;
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

The idempotencyKey and the isNotFound branch carry the weight. The key is built from the order ID, not generated fresh each tick, so retries collapse on the warehouse side. The isNotFound branch is the rule for every compensation: already-undone is success. If the warehouse partner tells you the reservation does not exist, the goal of the compensation is already met.

ChargePayment is similar, returning a charge_id. ShipOrder is the one that fails in the opening scene; its compensate calls void_label and treats "label was never created" as success. The full versions are walked through in Decoupled PHP.

A failure on step 3 triggers compensations 2 and 1 in reverse order, both marked idempotent.

The repository and the lease

The repository wraps every state transition in a single SQL statement. The lockForUpdate is the only SELECT ... FOR UPDATE SKIP LOCKED in the file; everything else is straight updates.

<?php
declare(strict_types=1);

namespace App\Saga\Order;

use PDO;

final class OrderSagaRepository
{
    public function __construct(private readonly PDO $pdo) {}

    public function lockForUpdate(string $id): OrderSagaRecord
    {
        $sql = 'SELECT * FROM order_sagas
                WHERE id = :id
                  AND (locked_until IS NULL OR locked_until < now())
                FOR UPDATE SKIP LOCKED';

        $stmt = $this->pdo->prepare($sql);
        $stmt->execute(['id' => $id]);
        $row = $stmt->fetch(PDO::FETCH_ASSOC);

        if ($row === false) {
            throw new SagaLockedException($id);
        }

        $this->pdo->prepare(
            'UPDATE order_sagas
             SET locked_until = now() + interval \'30 seconds\'
             WHERE id = :id',
        )->execute(['id' => $id]);

        return OrderSagaRecord::fromRow($row);
    }

    public function stepSucceeded(
        string $id,
        int $nextStep,
        array $payload,
    ): void {
        $this->pdo->prepare(
            'UPDATE order_sagas
             SET current_step = :step,
                 payload = :payload,
                 attempts = 0,
                 last_error = NULL,
                 updated_at = now()
             WHERE id = :id',
        )->execute([
            'id' => $id,
            'step' => $nextStep,
            'payload' => json_encode($payload, JSON_THROW_ON_ERROR),
        ]);
    }

    public function stepFailed(string $id, string $error): void
    {
        $this->pdo->prepare(
            "UPDATE order_sagas
             SET status = 'Compensating',
                 last_error = :err,
                 attempts = attempts + 1,
                 updated_at = now()
             WHERE id = :id",
        )->execute(['id' => $id, 'err' => $error]);
    }

    public function compensationSucceeded(
        string $id,
        int $compensatedIndex,
    ): void {
        $this->pdo->prepare(
            'UPDATE order_sagas
             SET current_step = :step,
                 updated_at = now()
             WHERE id = :id',
        )->execute([
            'id' => $id,
            'step' => $compensatedIndex,
        ]);
    }

    public function complete(string $id): void
    {
        $this->pdo->prepare(
            "UPDATE order_sagas
             SET status = 'Completed', updated_at = now()
             WHERE id = :id",
        )->execute(['id' => $id]);
    }

    public function fail(string $id): void
    {
        $this->pdo->prepare(
            "UPDATE order_sagas
             SET status = 'Failed', updated_at = now()
             WHERE id = :id",
        )->execute(['id' => $id]);
    }
}
Enter fullscreen mode Exit fullscreen mode

stepFailed flips the status to Compensating but leaves current_step where it is, so the orchestrator's rewind reads current_step - 1 to find the last successful step to undo. compensationSucceeded decrements current_step so the next tick rewinds the step before.

The worker

The worker is whatever you already have. Symfony Messenger, Laravel Horizon, a supervisord process polling Postgres: all fine. The worker's only job is to find ticking sagas and call tick($id).

<?php
declare(strict_types=1);

namespace App\Saga\Order;

final class SagaWorker
{
    public function __construct(
        private readonly OrderSaga $saga,
        private readonly OrderSagaRepository $repo,
    ) {}

    public function runOnce(int $batchSize = 20): int
    {
        $ids = $this->repo->dueIds($batchSize);

        foreach ($ids as $id) {
            try {
                $this->saga->tick($id);
            } catch (SagaLockedException) {
                continue;
            }
        }

        return count($ids);
    }
}
Enter fullscreen mode Exit fullscreen mode

dueIds returns up to $batchSize rows where status is non-terminal and the lease has expired or is null. The worker calls runOnce in a loop with a short sleep between batches. There is no scheduler, no event bus, no DAG library. Two PHP classes, one SQL table, and your existing worker process.

When this stops being enough

The pattern carries a real PHP service a long way. Three to five steps, retries, compensations, idempotency on the partner side: all in scope.

The places it starts to bend:

  • Once saga A needs to pause until saga B emits an event, you are building a small workflow engine inside Postgres. At that point Temporal earns its operational cost.
  • Hours-to-days durations. Lease-based polling works for seconds-to-minutes. If a step needs to wait three days for a webhook, you want timers and signals, both first-class in Temporal and painful to bolt on here.
  • Hundreds of steps per saga. The two-method-per-step convention gets repetitive past a dozen. A declarative DAG (or a step library that ships with one) starts to pay back.
  • Strict ordering of compensations across instances. This pattern compensates within one instance. If "release this seat" must happen before "issue refund" across a thousand cancellations, you want a workflow runtime, not a job queue.

Until then, the orchestrator class plus the state table runs production.


If this was useful

The book this is from, Decoupled PHP, is about keeping the saga, the use case, and the domain on one side of a clean boundary while Laravel, Symfony, Doctrine, and Stripe sit on the other side as adapters. The Event-Driven Architecture Pocket Guide is the shorter companion — outbox, idempotency, choreography vs orchestration, and the traps each one hides.

Decoupled PHP — Clean and Hexagonal Architecture for Applications That Outlive the Framework

Available on Kindle, Paperback, and Hardcover. English, German, and Japanese editions out now — Portuguese and Spanish coming soon.

Top comments (0)