DEV Community

ahmet gedik
ahmet gedik

Posted on

Rate-Limited Webhook Delivery for Video Events with PHP 8.4 and SQLite WAL

Last Tuesday at 20:47 CET we crossed 14,000 webhook deliveries in a single minute. Three partner endpoints went down within seconds, a fourth started returning 504s, and the retry queue in SQLite ballooned to 92,000 rows before our worker pool noticed anything was wrong. The cause was prosaic: a Eurovision-adjacent clip went viral across DE, FR, and NL simultaneously, and our event firehose dutifully fanned it out to every subscriber at line rate. The fix took two weeks of careful work, and I want to walk through it because rate-limited webhook delivery is one of those topics where every blog post stops at "use a token bucket" without addressing what happens at 3am when SQLite's busy-timeout starts compounding with your retry-after headers.

At ViralVidVault we ship roughly six million video events per day to a few hundred subscribers — most are small EU analytics shops, a handful are large trend-tracking SaaS platforms with their own SLOs. Our stack is deliberately boring: PHP 8.4 behind LiteSpeed, SQLite in WAL mode for everything except long-term aggregates, and Cloudflare Workers at the edge for dedup and request shaping. The constraints are GDPR (no payloads through third-party queues without DPAs), latency (P95 under 500ms from event capture to first delivery attempt), and cost (we cannot afford a Kafka cluster).

This is the design we landed on. The code below is a slightly cleaned-up version of what is running in production right now.

What rate-limited actually has to mean

Before any code, the requirements I wish I had written down before the Eurovision incident:

  • Per-subscriber limits. A slow subscriber must not slow down a fast one. The pool that delivers to partner A must be independent of the pool that delivers to partner B.
  • Burst tolerance. When a video crosses a trending threshold, we may emit fifty events for the same video_id within ten seconds (region rollups, score updates, comment-velocity flags). Subscribers should be able to absorb a small burst, then settle to a steady rate.
  • Backoff that respects the subscriber. If a subscriber returns 429 with Retry-After, we honour it precisely. If they return 503, we apply our own exponential backoff with jitter.
  • No payload duplication on retry. A retry must deliver the exact same payload bytes and the exact same HMAC signature as the original attempt — auditability matters under GDPR's accountability principle.
  • Bounded queue. If a subscriber has been failing for 24 hours, we stop attempting. Their queue should never grow unbounded; old events get dropped to dead-letter, not stored forever.

The combination of per-subscriber bucket and bounded retry window is what kills the naive implementations. You cannot just put everything in one big queue and pull off the head.

Schema in SQLite WAL

WAL mode is essential here — readers do not block writers, and the worker loop is hot enough that a journal_mode=delete database would serialise badly. The schema is intentionally narrow.

PRAGMA journal_mode = WAL;
PRAGMA synchronous = NORMAL;
PRAGMA busy_timeout = 5000;
PRAGMA wal_autocheckpoint = 1000;

CREATE TABLE webhook_subscribers (
  id           INTEGER PRIMARY KEY,
  endpoint     TEXT    NOT NULL,
  secret       TEXT    NOT NULL,
  events       TEXT    NOT NULL,          -- JSON array
  rate_per_min INTEGER NOT NULL DEFAULT 60,
  burst        INTEGER NOT NULL DEFAULT 20,
  region       TEXT    NOT NULL,          -- ISO country, for GDPR routing
  active       INTEGER NOT NULL DEFAULT 1
);

CREATE TABLE webhook_deliveries (
  id            INTEGER PRIMARY KEY,
  subscriber_id INTEGER NOT NULL REFERENCES webhook_subscribers(id),
  event_type    TEXT    NOT NULL,
  payload       BLOB    NOT NULL,
  signature     TEXT    NOT NULL,
  enqueued_at   INTEGER NOT NULL,
  attempt       INTEGER NOT NULL DEFAULT 0,
  next_attempt  INTEGER NOT NULL,
  leased_until  INTEGER NOT NULL DEFAULT 0,
  state         TEXT    NOT NULL DEFAULT 'pending'
);

CREATE INDEX idx_deliveries_due
  ON webhook_deliveries (state, next_attempt)
  WHERE state = 'pending';

CREATE TABLE webhook_buckets (
  subscriber_id INTEGER PRIMARY KEY REFERENCES webhook_subscribers(id),
  tokens        REAL    NOT NULL,
  updated_at    REAL    NOT NULL
);
Enter fullscreen mode Exit fullscreen mode

Two notes. First, payload is stored as the exact JSON bytes that will be sent — not a normalised object. This is what lets retries produce byte-identical bodies, which matters because the HMAC signature is computed over those bytes and stored alongside. Second, the partial index idx_deliveries_due is the only index the worker query uses; SQLite's query planner picks it correctly when you filter on state = 'pending' first.

The token bucket, properly

Most token-bucket implementations I see online keep state in Redis. We do not run Redis. The bucket lives in SQLite, gets updated on every check, and uses real-valued tokens so we do not lose precision at high refill rates. Here is the PHP 8.4 version we use — note the readonly properties and the strict transactional wrap.

<?php
declare(strict_types=1);

final readonly class TokenBucket
{
    public function __construct(
        private PDO $db,
        private int $subscriberId,
        private float $refillPerSecond,
        private float $capacity,
    ) {}

    public function tryConsume(float $cost = 1.0): bool
    {
        $now = microtime(true);

        $this->db->beginTransaction();
        try {
            $stmt = $this->db->prepare(
                'SELECT tokens, updated_at FROM webhook_buckets
                 WHERE subscriber_id = :sid'
            );
            $stmt->execute([':sid' => $this->subscriberId]);
            $row = $stmt->fetch(PDO::FETCH_ASSOC);

            if ($row === false) {
                $tokens = $this->capacity;
                $this->db->prepare(
                    'INSERT INTO webhook_buckets (subscriber_id, tokens, updated_at)
                     VALUES (:sid, :tok, :ts)'
                )->execute([
                    ':sid' => $this->subscriberId,
                    ':tok' => $tokens,
                    ':ts'  => $now,
                ]);
            } else {
                $elapsed = max(0.0, $now - (float)$row['updated_at']);
                $tokens  = min(
                    $this->capacity,
                    (float)$row['tokens'] + $elapsed * $this->refillPerSecond
                );
            }

            if ($tokens < $cost) {
                $this->db->prepare(
                    'UPDATE webhook_buckets SET tokens=:tok, updated_at=:ts
                     WHERE subscriber_id=:sid'
                )->execute([
                    ':tok' => $tokens,
                    ':ts'  => $now,
                    ':sid' => $this->subscriberId,
                ]);
                $this->db->commit();
                return false;
            }

            $tokens -= $cost;
            $this->db->prepare(
                'UPDATE webhook_buckets SET tokens=:tok, updated_at=:ts
                 WHERE subscriber_id=:sid'
            )->execute([
                ':tok' => $tokens,
                ':ts'  => $now,
                ':sid' => $this->subscriberId,
            ]);
            $this->db->commit();
            return true;
        } catch (Throwable $e) {
            $this->db->rollBack();
            throw $e;
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Two production lessons in that code. The min($this->capacity, ...) clamp is non-negotiable — without it a subscriber that has not received an event for a week will accumulate a week's worth of tokens and absorb the next burst in one gulp, which is exactly what you do not want. And the bucket update on the fail path (not just success) is what keeps the updated_at field fresh enough that subsequent calls do not compute a billion-second elapsed window after a long idle period.

The worker loop with leasing

The worker loop is where most implementations get subtly wrong. The naive version pulls a row, sends the HTTP request, marks the row delivered, repeats. The problem: if your worker dies mid-request, that row sits forever in state = 'pending' and the next worker grabs it as if nothing happened — except the request may have actually been delivered, so the subscriber sees a duplicate. Worse, if you run multiple workers (we run four per node), they will race and double-deliver under load.

The fix is a short-lived lease. A worker writes its lease expiry into the row, and other workers skip rows whose lease is still in the future. If the worker dies, the lease expires and someone else picks it up.

<?php
declare(strict_types=1);

final class WebhookWorker
{
    private const LEASE_SECONDS = 30;
    private const BATCH_SIZE    = 50;

    public function __construct(
        private readonly PDO $db,
        private readonly HttpClient $http,
    ) {}

    public function tick(): int
    {
        $now = time();
        $leaseUntil = $now + self::LEASE_SECONDS;

        $this->db->beginTransaction();
        $this->db->prepare(
            "UPDATE webhook_deliveries
                SET leased_until = :until
              WHERE id IN (
                SELECT id FROM webhook_deliveries
                 WHERE state = 'pending'
                   AND next_attempt <= :now
                   AND leased_until < :now
                 ORDER BY next_attempt
                 LIMIT :n
              )"
        )->execute([
            ':until' => $leaseUntil,
            ':now'   => $now,
            ':n'     => self::BATCH_SIZE,
        ]);

        $rows = $this->db->prepare(
            "SELECT d.*, s.endpoint, s.rate_per_min, s.burst
               FROM webhook_deliveries d
               JOIN webhook_subscribers s ON s.id = d.subscriber_id
              WHERE d.leased_until = :until
                AND d.state = 'pending'"
        );
        $rows->execute([':until' => $leaseUntil]);
        $batch = $rows->fetchAll(PDO::FETCH_ASSOC);
        $this->db->commit();

        $processed = 0;
        foreach ($batch as $row) {
            $bucket = new TokenBucket(
                $this->db,
                (int)$row['subscriber_id'],
                (float)$row['rate_per_min'] / 60.0,
                (float)$row['burst'],
            );
            if (!$bucket->tryConsume(1.0)) {
                $this->reschedule((int)$row['id'], $now + 2);
                continue;
            }
            $this->attempt($row, $now);
            $processed++;
        }
        return $processed;
    }

    private function attempt(array $row, int $now): void
    {
        $resp = $this->http->post(
            $row['endpoint'],
            $row['payload'],
            [
                'X-VVV-Signature' => $row['signature'],
                'X-VVV-Event'     => $row['event_type'],
                'X-VVV-Attempt'   => (string)((int)$row['attempt'] + 1),
                'Content-Type'    => 'application/json',
            ],
            timeout: 8,
        );

        if ($resp->status >= 200 && $resp->status < 300) {
            $this->db->prepare(
                "UPDATE webhook_deliveries
                    SET state='delivered', leased_until=0
                  WHERE id=:id"
            )->execute([':id' => $row['id']]);
            return;
        }

        $attempt = (int)$row['attempt'] + 1;
        if ($attempt >= 12) {
            $this->db->prepare(
                "UPDATE webhook_deliveries
                    SET state='dead', attempt=:a, leased_until=0
                  WHERE id=:id"
            )->execute([':a' => $attempt, ':id' => $row['id']]);
            return;
        }

        $retryAfter = (int)($resp->headers['retry-after'] ?? 0);
        $backoff = $retryAfter > 0
            ? $retryAfter
            : min(3600, (2 ** $attempt) + random_int(0, 30));

        $this->db->prepare(
            "UPDATE webhook_deliveries
                SET attempt=:a, next_attempt=:n, leased_until=0
              WHERE id=:id"
        )->execute([
            ':a'  => $attempt,
            ':n'  => $now + $backoff,
            ':id' => $row['id'],
        ]);
    }

    private function reschedule(int $id, int $next): void
    {
        $this->db->prepare(
            "UPDATE webhook_deliveries
                SET next_attempt=:n, leased_until=0
              WHERE id=:id"
        )->execute([':n' => $next, ':id' => $id]);
    }
}
Enter fullscreen mode Exit fullscreen mode

The two transactions matter. The first one is the lease grab — UPDATE ... WHERE id IN (SELECT ...) is atomic in SQLite, so no two workers will lease the same row. The second pass reads the leased rows back. If we tried to combine these, the lease window could close while we are iterating, which is exactly the race we are trying to avoid.

The 12-attempt cap with exponential backoff and ±30s jitter gives a maximum window of roughly 4096 seconds plus jitter — call it 70 minutes — before a delivery is dead-lettered. For a flapping subscriber that is 12 attempts spread across an hour, which is enough to ride out most transient outages without filling up the queue.

Cloudflare Workers for edge dedup

Here is where the architecture pays off. We do not fan out from the origin — we fan out from Cloudflare Workers. The origin writes one row into webhook_deliveries per logical event, and a Worker reads them on demand and proxies to the subscriber. The Worker also runs a tiny dedup window using KV with a 30-second TTL, so if our cron emits the same event twice (it has happened, more than once), the subscriber only sees one.

export default {
  async fetch(request, env) {
    if (request.method !== 'POST') return new Response('', { status: 405 });

    const body = await request.text();
    const sig  = request.headers.get('X-VVV-Signature') || '';
    const id   = request.headers.get('X-VVV-Delivery-Id') || '';

    if (!id || !sig) return new Response('bad request', { status: 400 });

    const dedupKey = `wh:${id}`;
    const seen = await env.WEBHOOK_DEDUP.get(dedupKey);
    if (seen) return new Response('duplicate', { status: 200 });

    const subscriber = await env.WEBHOOK_CONFIG.get(
      request.headers.get('X-VVV-Subscriber') || '',
      { type: 'json' }
    );
    if (!subscriber || !subscriber.endpoint) {
      return new Response('unknown subscriber', { status: 404 });
    }

    const verified = await verifyHmac(env.HMAC_SECRET, body, sig);
    if (!verified) return new Response('bad signature', { status: 401 });

    const resp = await fetch(subscriber.endpoint, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'X-VVV-Event': request.headers.get('X-VVV-Event'),
        'X-VVV-Signature': sig,
      },
      body,
      signal: AbortSignal.timeout(8000),
    });

    if (resp.ok) {
      await env.WEBHOOK_DEDUP.put(dedupKey, '1', { expirationTtl: 30 });
    }

    return new Response(await resp.text(), {
      status: resp.status,
      headers: { 'Retry-After': resp.headers.get('Retry-After') || '' },
    });
  },
};

async function verifyHmac(secret, body, signature) {
  const key = await crypto.subtle.importKey(
    'raw',
    new TextEncoder().encode(secret),
    { name: 'HMAC', hash: 'SHA-256' },
    false,
    ['verify']
  );
  const sigBytes = hexToBytes(signature);
  return crypto.subtle.verify(
    'HMAC',
    key,
    sigBytes,
    new TextEncoder().encode(body)
  );
}

function hexToBytes(hex) {
  const out = new Uint8Array(hex.length / 2);
  for (let i = 0; i < out.length; i++) {
    out[i] = parseInt(hex.substr(i * 2, 2), 16);
  }
  return out;
}
Enter fullscreen mode Exit fullscreen mode

The Worker does not store payloads — it just relays. That is deliberate: payload retention happens on the origin, where we own the disks, and Cloudflare KV holds nothing but a delivery-id boolean for 30 seconds. Under GDPR this matters because Cloudflare's data processing terms are easier to argue about when the data flowing through them is functionally opaque metadata rather than personal-data-bearing payloads.

GDPR-aware payload shaping

One thing we changed after the incident: we no longer ship raw uploader handles or commenter usernames in webhooks, even when the upstream YouTube payload contains them. Subscribers get a hashed creator_token they can correlate to their own records if they are our DPA partner, and otherwise just the video id, region rollup, score, and event type.

function shape_payload_for_webhook(array $event, string $region_secret): array
{
    return [
        'event'         => $event['type'],
        'video_id'      => $event['video_id'],
        'region'        => $event['region'],
        'score'         => $event['score'],
        'velocity'      => $event['velocity'] ?? null,
        'creator_token' => hash_hmac(
            'sha256',
            $event['channel_id'] ?? '',
            $region_secret
        ),
        'observed_at'   => $event['observed_at'],
    ];
}
Enter fullscreen mode Exit fullscreen mode

The HMAC keyed on region_secret makes the token stable per subscriber-region pair but unlinkable across subscribers. Without it you have handed every subscriber a global pseudonym they can join against, which is the kind of thing a DPA will frown at during a Schrems-II-flavoured review.

What the cron actually does

Worker invocation runs from LiteSpeed-side cron every 15 seconds, with a hard 10-second lock so we do not stack. The whole thing is a single PHP CLI invocation:

#!/usr/bin/env bash
LOCK=/var/run/vvv-webhook.lock
exec 9>$LOCK
flock -n 9 || exit 0

cd /var/www/vvv
timeout 12 /usr/bin/php8.4 -d opcache.enable_cli=1 \
  cron/webhook_drain.php >>/var/log/vvv/webhook.log 2>&1
Enter fullscreen mode Exit fullscreen mode

cron/webhook_drain.php instantiates the worker and calls tick() in a loop until it returns 0 or 10 seconds elapse. With four parallel CLI invocations and a batch size of 50, we process up to 800 events per minute per node before the rate limit even comes into play, which has so far been comfortable headroom for our peak load.

Observability you will actually use

Three counters and one histogram, exposed on a /metrics endpoint and scraped by our small Prometheus replacement (a Go binary, mainly because Go ships static binaries and we do not want to argue with package managers on shared hosting).

package main

import (
    "expvar"
    "net/http"
    "sync/atomic"
    "time"
)

var (
    delivered = expvar.NewInt("webhook_delivered_total")
    failed    = expvar.NewInt("webhook_failed_total")
    deadLet   = expvar.NewInt("webhook_dead_total")
    latencyMs atomic.Int64
)

func recordDelivery(status int, durMs int64) {
    switch {
    case status >= 200 && status < 300:
        delivered.Add(1)
    case status >= 500 || status == 429:
        failed.Add(1)
    default:
        deadLet.Add(1)
    }
    latencyMs.Store(durMs)
}

func main() {
    http.Handle("/debug/vars", expvar.Handler())
    http.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
        w.WriteHeader(http.StatusOK)
    })
    srv := &http.Server{
        Addr:        ":9095",
        ReadTimeout: 2 * time.Second,
    }
    _ = srv.ListenAndServe()
}
Enter fullscreen mode Exit fullscreen mode

The metric I check first when anything looks off is webhook_failed_total / webhook_delivered_total over a five-minute window. If it goes above 0.02 we get a Slack notification. The metric I check second is the count of rows in state = 'pending' AND next_attempt < strftime('%s','now') - 60 — that is the we-are-behind metric, and it should be zero in steady state.

What I would do differently

If I were starting over I would put the bucket state in a separate SQLite file with its own WAL, because under burst conditions the bucket table sees an order of magnitude more writes than the deliveries table and they end up sharing checkpoints. I would also build the dedup directly into the worker rather than the edge — the edge dedup is nice but it means losing 30 seconds of dedup history if a Worker region cold-starts, which has bitten us exactly once. And I would write the HMAC signature at enqueue time rather than at delivery time. We do this now, but the migration was awkward because old rows had no stored signature and we had to recompute on read for a transition period.

Webhook delivery looks simple on the whiteboard and it is, at low volumes. At European-prime-time scale on a small operational budget, the boring choices — partial indexes, leases, per-subscriber buckets stored in the same place as the deliveries themselves — are what keep the system honest. The flashy choices are almost always the ones that bite at 3am.

Top comments (0)