Backing Up an API Gateway's Config — Per Connection, Restore-Safe

#laravel #devops #php #architecture

I spent today building a backup-and-restore feature for a Laravel app that manages multiple API gateway connections — each connection being a separate gateway environment (think prod, staging, a client's cluster). Kong stores all its routing, plugins, consumers and credentials in its own datastore, and "just trust the gateway's own backup" stops being good enough the moment you're orchestrating several of them from one control plane. You want your snapshot, on your schedule, scoped to one connection, that you can diff and roll back.

Here's the shape of the thing and the decisions that mattered.

A backup is a row, not a file dump

The naive version is kong config db_export > backup.yml and call it a day. That works for one gateway you SSH into. It falls apart when you've got N connections managed remotely and you want history, status, and "restore this exact one" from a UI.

So a backup became a first-class model — KongConnectionBackup — with a UUID public id, an auto-increment internal id, a foreign key to the connection, the captured payload, and two enums carrying its state. The enums are where a lot of the clarity lives:

enum KongBackupStatus: string
{
    case Pending   = 'pending';
    case Running   = 'running';
    case Completed = 'completed';
    case Failed    = 'failed';

    public function label(): string
    {
        return match ($this) {
            self::Pending   => 'Pending',
            self::Running   => 'In progress',
            self::Completed => 'Completed',
            self::Failed    => 'Failed',
        };
    }

    public function color(): string
    {
        return match ($this) {
            self::Pending   => 'gray',
            self::Running   => 'amber',
            self::Completed => 'green',
            self::Failed    => 'red',
        };
    }
}

label() and color() keep the Blade/Livewire layer dumb — the view just asks the enum how to render itself, no @if ladders mapping status strings to badge colors. Same pattern for KongBackupSection, which scopes what a backup covers (services, routes, plugins, consumers…) so a restore can be partial instead of all-or-nothing.

The capture runs in a job, the command just dispatches

Backing up a live gateway is slow and failure-prone — network calls, big payloads, rate limits. None of that belongs in a synchronous request or blocking an Artisan command. So the actual work lives in a queued job:

class BackupKongConnection implements ShouldQueue
{
    public function __construct(
        public KongConnectionBackup $backup,
    ) {}

    public function handle(KongConnectionBackupService $service): void
    {
        $this->backup->update(['status' => KongBackupStatus::Running]);

        try {
            $service->capture($this->backup);
            $this->backup->update(['status' => KongBackupStatus::Completed]);
        } catch (Throwable $e) {
            $this->backup->update(['status' => KongBackupStatus::Failed]);
            throw $e; // let the queue record the failure + retry policy decide
        }
    }
}

The status transitions are the contract: a backup row is created Pending, flips to Running when the job picks it up, then lands on Completed or Failed. The UI subscribes to that, so you get a live progress badge for free. Re-throwing after marking Failed matters — you still want the failed job in failed_jobs with its stack trace, not a silently swallowed exception that leaves a green checkmark on a broken backup.

The console command (KongBackupCommand) and the scheduled variant both do the same tiny thing: create the row, dispatch the job, return. Thin command, fat service — the command is just one more trigger alongside the UI button and the scheduler.

Scheduling is per connection, not global

A single global "backup everything at 2am" is tempting and wrong. Connections have different blast radii — a client's prod gateway you want nightly with retention; a throwaway sandbox you don't want cluttering storage at all. So scheduling keys off each connection's own settings, and KongScheduledBackupCommand iterates the connections that opted in:

$connections->each(function (KongConnection $connection) {
    BackupKongConnection::dispatch(
        $connection->backups()->create([
            'status'   => KongBackupStatus::Pending,
            'sections' => $connection->backup_sections,
        ])
    );
});

The scheduler entry itself is boring on purpose — register the command in routes/console.php (or the schedule() method) and let the command sort out which connections actually run.

Restore is the part you test like your job depends on it

Backup is forgiving — worst case you take another one. Restore is the dangerous direction: you're pushing state back into a live gateway, potentially clobbering newer config. Two guardrails went in.

First, restore is section-aware. You can restore just plugins without touching consumers, because KongBackupSection already modelled the boundaries. Second, the service treats restore as "reconcile toward this snapshot" rather than "blind overwrite" — it knows the difference between what's in the snapshot and what's live, so it isn't surprised by entities that already exist.

This is exactly the kind of thing a Pest test should pin down before it ever touches a real gateway:

it('marks a backup completed when capture succeeds', function () {
    $connection = KongConnection::factory()->create();
    $backup = $connection->backups()->create([
        'status' => KongBackupStatus::Pending,
    ]);

    $this->mock(KongConnectionBackupService::class)
        ->shouldReceive('capture')
        ->once()
        ->with($backup);

    (new BackupKongConnection($backup))->handle(app(KongConnectionBackupService::class));

    expect($backup->fresh()->status)->toBe(KongBackupStatus::Completed);
});

it('marks a backup failed and rethrows when capture blows up', function () {
    $backup = KongConnection::factory()
        ->create()
        ->backups()->create(['status' => KongBackupStatus::Pending]);

    $this->mock(KongConnectionBackupService::class)
        ->shouldReceive('capture')
        ->andThrow(new RuntimeException('gateway unreachable'));

    expect(fn () => (new BackupKongConnection($backup))->handle(app(KongConnectionBackupService::class)))
        ->toThrow(RuntimeException::class);

    expect($backup->fresh()->status)->toBe(KongBackupStatus::Failed);
});

Testing the status machine in isolation — without a real gateway — is what makes the feature safe to iterate on. The HTTP-level capture/restore gets its own integration tests against a disposable gateway, but the state transitions are pure and fast to assert.

Takeaway

The lesson that keeps repeating: model the artifact (a backup is a row with status + scope), push the slow/fallible work into a job, keep the command/UI/scheduler as thin triggers over a single service, and let enums own their own presentation. The restore direction is where you spend your test budget — backups are cheap to retry, restores are not.

What's next is retention and pruning (a Completed backup older than N days for a sandbox connection shouldn't live forever), which is a nice small Observer-driven follow-up.