ahmet gedik

Posted on Jun 29

Running Caddy 2 as a Reverse Proxy for a Multi-Region Video API

#caddy #go #devops #performance

Our video metadata API was fast in Frankfurt and miserable everywhere else. A user in São Paulo asking for the trending feed was waiting 380–450ms just for the round trip to a single origin, before SQLite even touched the FTS5 index. We run DailyWatch, a free video discovery platform with an English-speaking audience spread across the Americas, Europe, and South/East Asia, and a single-origin reverse proxy was the bottleneck nobody wanted to own. This is the write-up of how we replaced an aging nginx config with Caddy 2 as a region-aware reverse proxy edge, what actually moved the latency numbers, and the code that holds it together.

I'll be specific about the stack because the choices only make sense in context: PHP 8.4 application servers behind LiteSpeed, SQLite with FTS5 for the search index, and Cloudflare sitting in front of everything. Caddy 2 slots in between Cloudflare and the origins, and that placement is the whole point.

The latency problem that started this

The API itself was fine. A /v1/feed?region=BR request resolved in 12–18ms on the box. The pain was geographic. We had origins in three places — eu-fra, us-iad, and ap-sin — but the proxy in front of them was dumb. Requests landed wherever Cloudflare's nearest data center routed them, and then got proxied to whichever origin the config happened to list first. Half our Asian traffic was crossing two oceans to hit Frankfurt.

The symptoms were concrete:

p95 latency for non-EU regions sat above 350ms even on cache misses that should have cost 20ms.
When eu-fra had a deploy hiccup, nginx kept routing to it for ~30 seconds because passive health checks only fail open after enough errors accumulate.
Compression and cache-control headers were inconsistent because three different origin configs each set them slightly differently.

We wanted one component that owned TLS termination toward the origins, health-checked failover, region-aware upstream selection, and a single source of truth for caching headers. Caddy 2 does all of that with a config file you can actually read six months later.

Why Caddy 2 over nginx for this job

I'm not religious about proxies. nginx is excellent. But for a multi-region API edge, Caddy 2 had three properties that mattered more than raw throughput (both are far faster than our origins anyway):

Active health checks are first-class. nginx open-source only does passive checks unless you pay for nginx Plus. Caddy does active HTTP probing out of the box, so a sick origin is pulled from rotation before a user request fails on it.
The reverse_proxy directive understands upstream policies natively. Round-robin, least-connections, first-available, IP-hash, and header-based selection are config flags, not Lua modules.
Structured JSON config under the hood with a human Caddyfile on top. We generate part of our config from a script (more on that below), and Caddy's admin API lets us push upstream changes without a full reload.

The one thing to know going in: Caddy terminates TLS automatically, which is wonderful for public endpoints but something you must explicitly control when you sit behind Cloudflare. We do, and I'll show the relevant block.

A region-aware Caddyfile

Here is the core of the edge config. Cloudflare passes the visitor's country in CF-IPCountry, and we map that header to an upstream group. Caddy's @named matchers make this clean — each region matcher selects a different reverse_proxy with its own ordered upstream list, so failover stays inside the geographically closest set first.

{
    # We're behind Cloudflare (Flexible/Full). Don't auto-issue certs on the
    # internal hostname; Cloudflare handles the public TLS.
    auto_https off
    servers {
        trusted_proxies static private_ranges
    }
}

api.dailywatch.video:80 {
    # Trust Cloudflare's forwarded client IP.
    request_header +X-Real-IP {http.request.header.CF-Connecting-IP}

    # Region matchers driven by Cloudflare's geo header.
    @asia header CF-IPCountry SG TW HK JP KR VN TH ID MY PH IN
    @americas header CF-IPCountry US CA BR MX AR CL CO PE
    # Everything else (incl. EU) falls through to the default handler.

    handle @asia {
        reverse_proxy ap-sin.internal:8080 eu-fra.internal:8080 {
            lb_policy first
            health_uri /healthz
            health_interval 3s
            health_timeout 2s
            health_status 200
            header_up X-Edge-Region asia
        }
    }

    handle @americas {
        reverse_proxy us-iad.internal:8080 eu-fra.internal:8080 {
            lb_policy first
            health_uri /healthz
            health_interval 3s
            header_up X-Edge-Region americas
        }
    }

    handle {
        reverse_proxy eu-fra.internal:8080 us-iad.internal:8080 {
            lb_policy first
            health_uri /healthz
            health_interval 3s
            header_up X-Edge-Region eu
        }
    }

    encode zstd gzip
    header {
        Cache-Control "public, max-age=60, stale-while-revalidate=600"
        Vary "Accept-Encoding, CF-IPCountry"
        -Server
    }
}

The key decision is lb_policy first. Combined with active health checks, first means "always use the closest origin, fall to the next only when the closest is unhealthy." Round-robin would have spread Asian traffic back to Frankfurt, which is exactly the problem we were solving. first gives us geographic affinity and automatic failover in one policy.

Note Vary: CF-IPCountry. Without it, a cached response from one region could be served to another. We add the country to the Vary key so Cloudflare and any downstream cache key correctly per region.

Health-checked upstreams and failover

The health_uri /healthz block is doing the heavy lifting. Every three seconds Caddy probes each origin. The origin's health endpoint should report more than "the process is up" — it should report whether this region's data plane is actually serviceable. On our PHP origins the health check verifies the SQLite handle and that the FTS5 index responds, because a locked database is the failure mode that actually hurts us.

<?php
// healthz.php — origin health endpoint Caddy probes every 3s.
// Returns 200 only when SQLite + FTS5 are genuinely serviceable.
declare(strict_types=1);

header('Content-Type: application/json');

$dbPath = '/var/data/video.db';
$started = hrtime(true);

try {
    $pdo = new PDO('sqlite:' . $dbPath, null, null, [
        PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
        PDO::ATTR_TIMEOUT => 1, // fail fast if the DB is locked
    ]);

    // Touch the FTS5 index — a cheap query that fails if the index is corrupt.
    $row = $pdo->query(
        "SELECT count(*) AS n FROM videos_fts WHERE videos_fts MATCH 'a*' LIMIT 1"
    )->fetch(PDO::FETCH_ASSOC);

    $elapsedMs = (hrtime(true) - $started) / 1e6;

    // If the probe itself is slow, report degraded so Caddy routes elsewhere.
    if ($elapsedMs > 250) {
        http_response_code(503);
        echo json_encode(['status' => 'degraded', 'probe_ms' => $elapsedMs]);
        return;
    }

    http_response_code(200);
    echo json_encode([
        'status'    => 'ok',
        'region'    => getenv('EDGE_REGION') ?: 'unknown',
        'fts_rows'  => (int) $row['n'],
        'probe_ms'  => round($elapsedMs, 2),
    ]);
} catch (Throwable $e) {
    http_response_code(503);
    echo json_encode(['status' => 'down', 'error' => $e->getMessage()]);
}

Returning 503 when the probe takes longer than 250ms is deliberate. A SQLite database under write contention doesn't error — it gets slow. A binary up/down check would keep a degraded origin in rotation. By treating slowness as unhealthy, Caddy pulls the box before users feel it, and lb_policy first quietly shifts that region's traffic to the secondary origin until the probe recovers.

Picking the closest origin in PHP

The Caddyfile handles the common case statically, but country-to-region mapping is data, not logic, and we didn't want to redeploy the proxy every time we adjusted a mapping. We generate the matcher lists from a single PHP source of truth that the rest of the application already uses for region selection. This keeps the proxy's geography and the app's geography from drifting apart.

<?php
// region_map.php — single source of truth for country -> origin region.
// Used by the app AND to generate Caddy matchers, so they never drift.
declare(strict_types=1);

final class RegionMap
{
    /** @var array<string, list<string>> region => ordered origins */
    private const ORIGINS = [
        'asia'     => ['ap-sin.internal:8080', 'eu-fra.internal:8080'],
        'americas' => ['us-iad.internal:8080', 'eu-fra.internal:8080'],
        'eu'       => ['eu-fra.internal:8080', 'us-iad.internal:8080'],
    ];

    /** @var array<string, string> ISO country => region */
    private const COUNTRY = [
        'SG' => 'asia', 'TW' => 'asia', 'HK' => 'asia', 'JP' => 'asia',
        'KR' => 'asia', 'IN' => 'asia', 'VN' => 'asia', 'TH' => 'asia',
        'US' => 'americas', 'CA' => 'americas', 'BR' => 'americas',
        'MX' => 'americas', 'AR' => 'americas', 'CL' => 'americas',
    ];

    public static function regionFor(string $country): string
    {
        return self::COUNTRY[strtoupper($country)] ?? 'eu';
    }

    /** Ordered upstream list for a given request region. */
    public static function originsFor(string $region): array
    {
        return self::ORIGINS[$region] ?? self::ORIGINS['eu'];
    }

    /** Emit Caddy `handle` blocks from this map — run in CI. */
    public static function toCaddyMatchers(): string
    {
        $byRegion = [];
        foreach (self::COUNTRY as $cc => $region) {
            $byRegion[$region][] = $cc;
        }
        $out = '';
        foreach ($byRegion as $region => $countries) {
            $out .= "@{$region} header CF-IPCountry " . implode(' ', $countries) . "\n";
        }
        return $out;
    }
}

// Generate matchers for the Caddyfile build step:
if (PHP_SAPI === 'cli') {
    echo RegionMap::toCaddyMatchers();
}

Now php region_map.php emits exactly the @asia/@americas matcher lines, and our CI assembles the final Caddyfile from a template plus this output. The PHP app calls RegionMap::regionFor() for its own internal routing decisions. One file, two consumers, zero drift between what the app believes about geography and what the proxy enforces.

A latency probe that adjusts the routing

Static country maps are a good baseline, but real network conditions shift. A transit provider has a bad day and suddenly ap-sin is slower from Mumbai than eu-fra. We run a small Python probe on each edge that measures real latency to every origin and writes the results where our config build can read them. It doesn't reconfigure Caddy live — it produces evidence we use to re-order the static lists, which is safer than fully dynamic routing that can flap.

#!/usr/bin/env python3
"""latency_probe.py — measure edge->origin latency, emit JSON for the
config build. Run from cron on each Caddy host."""
import json
import statistics
import time
import urllib.request

ORIGINS = {
    "ap-sin": "http://ap-sin.internal:8080/healthz",
    "us-iad": "http://us-iad.internal:8080/healthz",
    "eu-fra": "http://eu-fra.internal:8080/healthz",
}
SAMPLES = 7


def probe(url: str) -> float | None:
    timings = []
    for _ in range(SAMPLES):
        start = time.perf_counter()
        try:
            with urllib.request.urlopen(url, timeout=2) as resp:
                resp.read()
            timings.append((time.perf_counter() - start) * 1000.0)
        except Exception:
            continue
        time.sleep(0.1)
    if not timings:
        return None
    # Drop the slowest sample to reduce noise, then take the median.
    timings.sort()
    return round(statistics.median(timings[:-1] or timings), 2)


def main() -> None:
    results = {name: probe(url) for name, url in ORIGINS.items()}
    ranked = sorted(
        (r for r in results.items() if r[1] is not None),
        key=lambda kv: kv[1],
    )
    report = {
        "measured_at": int(time.time()),
        "latency_ms": results,
        "preferred_order": [name for name, _ in ranked],
    }
    print(json.dumps(report, indent=2))


if __name__ == "__main__":
    main()

The preferred_order field is what we care about. If a probe on the Singapore edge consistently reports eu-fra faster than ap-sin for a few hours, an alert fires and we know transit is degraded before users complain. We deliberately keep a human (or at least a slow CI cadence) in the loop here — fully automatic reordering on a 7-sample probe is how you build a system that oscillates between origins every cron tick.

Coordinating with Cloudflare and LiteSpeed

Three caching layers means three chances to cache the wrong thing. Our rules:

Cloudflare caches at the public edge, keyed including CF-IPCountry via our Vary header. Short max-age=60 with stale-while-revalidate=600 keeps the feed fresh without hammering origins.
Caddy does not cache by default, and we keep it that way. It's a router and TLS/compression boundary, not a cache. Adding a cache module here would just create a fourth layer to invalidate.
LiteSpeed on the origin serves the PHP page cache. Caddy forwards Cache-Control from the origin untouched for documents, and only overrides it for the API feed endpoints where we want a consistent policy.

The encode zstd gzip directive is worth calling out. We let Caddy own compression at the proxy edge rather than each origin, so the negotiation logic lives in one place. Cloudflare re-compresses for the client, but compressing origin→edge traffic still cuts our internal bandwidth on JSON payloads by roughly 70%.

One gotcha behind Cloudflare Flexible mode: auto_https off is mandatory. If you leave Caddy's automatic HTTPS on while listening on an internal hostname that has no public DNS, it will spin trying to solve ACME challenges. Turn it off, let Cloudflare terminate public TLS, and trust the forwarded protocol header.

A Go aggregator for fleet-wide health

The Python probe runs per-edge. To see the whole fleet at once we run a tiny Go service that fans out to every origin's /healthz concurrently and exposes a single rollup. Go's goroutines make the concurrent fan-out trivial and the binary deploys with no runtime dependencies, which matters on minimal proxy hosts.

package main

import (
    "encoding/json"
    "net/http"
    "sync"
    "time"
)

var origins = map[string]string{
    "ap-sin": "http://ap-sin.internal:8080/healthz",
    "us-iad": "http://us-iad.internal:8080/healthz",
    "eu-fra": "http://eu-fra.internal:8080/healthz",
}

type result struct {
    Healthy bool    `json:"healthy"`
    Millis  float64 `json:"millis"`
}

func check(url string) result {
    client := http.Client{Timeout: 2 * time.Second}
    start := time.Now()
    resp, err := client.Get(url)
    if err != nil {
        return result{Healthy: false}
    }
    defer resp.Body.Close()
    return result{
        Healthy: resp.StatusCode == http.StatusOK,
        Millis:  float64(time.Since(start).Microseconds()) / 1000.0,
    }
}

func rollup(w http.ResponseWriter, _ *http.Request) {
    var mu sync.Mutex
    var wg sync.WaitGroup
    out := make(map[string]result, len(origins))

    for name, url := range origins {
        wg.Add(1)
        go func(name, url string) {
            defer wg.Done()
            r := check(url)
            mu.Lock()
            out[name] = r
            mu.Unlock()
        }(name, url)
    }
    wg.Wait()

    w.Header().Set("Content-Type", "application/json")
    _ = json.NewEncoder(w).Encode(out)
}

func main() {
    http.HandleFunc("/fleet", rollup)
    _ = http.ListenAndServe(":9090", nil)
}

We scrape /fleet into our dashboard and our alerting. When eu-fra flips unhealthy, the rollup shows it instantly across every edge, and because Caddy's own active checks already pulled it from rotation, the alert is informational rather than a fire drill. The two systems agree, which is the whole goal: the proxy acts on health autonomously, and the Go aggregator tells humans what the proxy already decided.

What we measured

Numbers after two weeks in production, comparing the old single-list nginx config to region-aware Caddy 2:

p95 latency for Asian traffic dropped from 358ms to 96ms on cache misses — almost entirely the elimination of cross-ocean origin hops.
p95 for the Americas went from 290ms to 88ms.
EU traffic stayed flat (it was already hitting Frankfurt), which is exactly what we wanted — no regression for the region that was fine.
Failover time when an origin went unhealthy dropped from ~30s of intermittent errors to a single missed 3s health interval, with zero user-facing 5xx during the last two deploys.
Origin→edge bandwidth fell ~70% on JSON endpoints thanks to centralizing zstd/gzip at the proxy.

The config is also smaller and more honest than what it replaced. One Caddyfile, generated partly from the same PHP region map the application uses, with active health checks that understand what "healthy" means for a SQLite-backed origin.

Conclusion

Caddy 2 turned out to be the right tool not because it's faster than nginx in microbenchmarks — at our scale neither proxy is the bottleneck — but because region-aware routing, active health checks, and centralized compression are all config, not custom modules. The combination of lb_policy first with active probing gives you geographic affinity and automatic failover in two lines, and keeping the country-to-region map in one PHP file that feeds both the app and the proxy build is what stops the two from drifting apart over time.

If you're running a multi-region API and your reverse proxy still treats all origins as an undifferentiated pool, that's almost certainly where your tail latency is hiding. Map your regions, health-check them honestly, and let the closest healthy origin win. That single change did more for our worldwide p95 than any amount of query tuning on the origins themselves.

DEV Community