DEV Community

ahmet gedik
ahmet gedik

Posted on

Building a Multi-Tier Video Cache: Edge, Regional, and Origin Layers

Why Three Tiers Beat One

When we built DailyWatch for global English video discovery, single-tier caching kept biting us. A flat origin cache fronted by a CDN works until you have international traffic, regional content variations, and a long tail of cold videos. The fix isn't a bigger cache — it's the right cache at the right layer.

Our architecture splits caching into three concentric tiers:

  • Edge tier: CDN PoPs serving static manifests, thumbnails, and short-TTL HTML
  • Regional tier: Shared memory caches (Redis/Memcached) per geographic region
  • Origin tier: PHP file-based page cache plus database query cache

Each layer has different invalidation rules, different TTLs, and different failure modes. Treating them as one homogenous "cache" is how you end up serving stale thumbnails to half the planet.

Edge Tier: CDN With Honest Cache Headers

The edge tier is where most CDNs earn their keep. Cloudflare, Fastly, BunnyCDN — pick one, but give it accurate Cache-Control headers. The biggest mistake I see: setting max-age without stale-while-revalidate. You want users to get a fast (possibly stale) response while a background revalidation hits the next tier.

Here is a representative PHP response header set we use for the watch page:

<?php
function emitVideoPageHeaders(int $videoId, int $lastModified): void {
    $etag = '"v' . $videoId . '-' . $lastModified . '"';
    header('Cache-Control: public, max-age=21600, stale-while-revalidate=86400');
    header('ETag: ' . $etag);
    header('Vary: Accept-Encoding');
    header('CDN-Cache-Control: public, max-age=43200');

    if (($_SERVER['HTTP_IF_NONE_MATCH'] ?? '') === $etag) {
        http_response_code(304);
        exit;
    }
}
Enter fullscreen mode Exit fullscreen mode

Two things matter here:

  • CDN-Cache-Control lets the edge keep content longer than the browser does, which dramatically lowers origin pulls.
  • Vary: Accept-Encoding only — never Vary: Cookie on cacheable routes, or you fragment the cache per session and watch hit ratio collapse.

Regional Tier: The Layer Everyone Skips

Most teams jump from CDN edge straight to origin. That gap is where international latency hides. A user in Sydney on a cache miss reaches Frankfurt, waits 280ms, and now your "fast" page feels sluggish. A regional tier — shared memory caches deployed per geographic cluster — fills that gap.

We run regional caches as plain Redis instances in each region with consistent hashing for shard routing. Below is the Go logic that decides which tier answers a request:

type Tier int

const (
    TierEdge Tier = iota
    TierRegional
    TierOrigin
)

func ResolveVideo(ctx context.Context, id, region string) (*Video, Tier, error) {
    if v, ok := edgeLookup(ctx, id); ok {
        return v, TierEdge, nil
    }
    if v, ok := regionalLookup(ctx, region, id); ok {
        go warmEdge(ctx, id, v)
        return v, TierRegional, nil
    }
    v, err := originLookup(ctx, id)
    if err != nil {
        return nil, TierOrigin, err
    }
    go warmRegional(ctx, region, id, v)
    return v, TierOrigin, nil
}
Enter fullscreen mode Exit fullscreen mode

A few patterns we enforce:

  • Backward warming only. When a lower tier serves the response, asynchronously warm every tier above it. Never warm sideways across regions — that creates a thundering-herd amplifier on origin.
  • Region-aware keys. A video's metadata is global, but its trending rank is regional. Mixing them in one key causes cross-region pollution.
  • Soft TTL plus hard TTL. Regional entries have a 30-minute soft TTL (serve, revalidate async) and a 4-hour hard TTL (forced refetch).

Origin Tier: Cheap Tricks That Punch Above Their Weight

The origin tier is where I see most teams overspend. Before you reach for an in-memory store, try file-based page caches. Disk is cheap, OPcache is free, and a write-rarely, read-often workload like a video watch page is exactly what a flat-file cache eats for breakfast.

Our origin uses a Python pre-renderer for popular pages and a PHP runtime cache for everything else. Here is the pre-render loop, simplified:

import hashlib, pathlib, time

CACHE_DIR = pathlib.Path("/var/cache/pagecache")
SOFT_TTL = 3 * 3600

def cache_key(url: str) -> pathlib.Path:
    digest = hashlib.sha1(url.encode()).hexdigest()
    return CACHE_DIR / digest[:2] / f"{digest}.html"

def serve_or_render(url, render):
    path = cache_key(url)
    if path.exists():
        age = time.time() - path.stat().st_mtime
        if age < SOFT_TTL:
            return path.read_bytes()
    html = render(url)
    path.parent.mkdir(parents=True, exist_ok=True)
    path.write_bytes(html)
    return html
Enter fullscreen mode Exit fullscreen mode

Some hard-won rules for the origin tier:

  • Cache the rendered HTML, not just the data behind it. Skipping template render saves 30–60ms per request on PHP.
  • Shard the cache directory by hash prefix (digest[:2]). Ten thousand files in one folder destroys filesystem performance.
  • Treat OPcache and the page cache as separate problems. OPcache helps your code path; the page cache helps your users.

Invalidation: The Only Hard Part

Caching is easy. Invalidation breaks teams. Our rules:

  • Publish-time invalidation, not request-time. When a video is published or metadata changes, push a targeted purge to all three tiers in parallel.
  • Tag-based purges at the edge. Modern CDNs support cache tags — use them. Purging by URL list at scale is brittle.
  • Origin purge is local. Just unlink the file. Do not overthink it.
  • Never blanket-purge regional caches. A full flush triggers a thundering herd on origin. Use surgical, key-level deletes.

What You Actually Gain

After moving from a flat CDN-plus-origin design to the three-tier model:

  • Average TTFB on cache miss dropped from 380ms to 95ms (regional tier absorbs the long-haul latency)
  • Origin CPU dropped roughly 60% during traffic spikes
  • Edge hit ratio climbed from 78% to 94% (because we stopped fragmenting on cookies)
  • 95th-percentile international latency is now bounded by the regional tier, not transatlantic links

If you are running a global content site and your cache strategy is one CDN plus one origin, the regional tier is the highest-leverage change you can ship this quarter.

Top comments (0)