ahmet gedik

Posted on Jun 15

Caching Video Aggregator Pages with Varnish ESI and Edge Fragments

#varnish #caching #php #performance

Last quarter our "trending in Japan" landing page started timing out during the 8pm JST traffic spike. The page itself is cheap to render, but it is a mosaic: a stable masthead, a category nav that changes maybe once a day, a regional trending rail that refreshes every few minutes, and per-video view counters that tick constantly. Our LiteSpeed full-page cache treated the whole document as one object, so the fast-moving view counter dragged the effective TTL of the entire page down to 60 seconds. At peak that meant re-rendering the full page — 14 queries against SQLite FTS5 with the CJK tokenizer — tens of thousands of times a minute. The origin fell over not because rendering was slow, but because we were caching at the wrong granularity.

The fix was to stop caching pages and start caching fragments. This post walks through how we put Varnish in front of the origin and used Edge Side Includes (ESI) to assemble pages from independently-cached pieces, each with its own TTL and its own invalidation rules. If you run a read-heavy aggregator like TopVideoHub, this pattern can cut origin load by an order of magnitude without touching your application's data model.

Why full-page caching breaks down

The core problem is that a single HTML document mixes content with wildly different change rates, but a full-page cache can only assign one TTL to the whole blob. The slowest-to-change region of the page is held hostage by the fastest.

Break our trending page apart and the mismatch is obvious:

Masthead and footer — change on deploy, maybe weekly. Could be cached for hours.
Category navigation — driven by the category table, changes once or twice a day. Good for hours.
Regional trending rail — recomputed every cron cycle. Good for 2-3 minutes.
Live view counters — tick continuously. Good for 30 seconds at most.

With one TTL, you either set it to 30 seconds and re-render the expensive 95% of the page constantly, or you set it to an hour and serve stale view counts. Neither is acceptable. For a market like Asia-Pacific where we run separate trending lists for JP, KR, TW, HK, SG, TH and VN, the page count multiplies the waste: every region is its own full-page object, and every region is re-rendered on the same brutal 60-second cycle.

Fragment caching solves this by giving each piece its own lifecycle. The expensive shell is computed once and held for hours; only the cheap, fast-moving fragments are recomputed often. ESI is the mechanism that lets a cache do that assembly for you, at the edge, with zero client-side JavaScript.

The ESI mental model

ESI is a tiny markup language the cache understands. Your origin returns a shell document that contains placeholder tags like <esi:include src="/esi/trending/jp" />. When Varnish sees an ESI-enabled response, it parses those tags, fetches each src as if it were a separate client request, caches each fragment independently, and splices the results into the parent document before sending it downstream.

The important consequences:

Each fragment is a normal HTTP response with its own Cache-Control and TTL.
A fragment can be a cache hit while the page around it is being assembled, and vice versa.
Purging one fragment does not touch the others.
The client never sees the esi: tags — it receives fully-assembled HTML.

This is fundamentally different from client-side composition. The browser makes one request and gets one finished document. SEO crawlers see the complete page. There is no layout shift, no loading spinner, no hydration. The assembly happens at the edge, in microseconds, from objects already in RAM.

Putting Varnish in front of the origin

Here is the VCL we run. The two interesting subroutines are vcl_recv, where we normalize and authorize requests, and vcl_backend_response, where we enable ESI and assign per-fragment TTLs based on the URL prefix of the fragment being fetched.

vcl 4.1;

backend origin {
    .host = "127.0.0.1";
    .port = "8080";
}

sub vcl_recv {
    # Normalize so /jp and /jp/ hit the same object
    set req.url = regsub(req.url, "/$", "");

    # Strip tracking params before they fragment the cache key
    if (req.url ~ "[?&](utm_|fbclid|gclid)") {
        set req.url = regsub(req.url, "[?&](utm_[^&]+|fbclid=[^&]+|gclid=[^&]+)", "");
    }

    # PURGE is only accepted from the app server on loopback
    if (req.method == "PURGE") {
        if (client.ip != "127.0.0.1") {
            return (synth(405));
        }
        return (purge);
    }
}

sub vcl_backend_response {
    # Turn on ESI parsing only when the origin explicitly asks for it
    if (beresp.http.Surrogate-Control ~ "ESI/1.0") {
        unset beresp.http.Surrogate-Control;
        set beresp.do_esi = true;
    }

    # Fragment TTLs are driven by the fragment, not the page
    if (bereq.url ~ "^/esi/counter/") {
        set beresp.ttl = 30s;
        set beresp.grace = 10s;
    } elsif (bereq.url ~ "^/esi/trending/") {
        set beresp.ttl = 180s;
        set beresp.grace = 60s;
    } else {
        set beresp.ttl = 6h;
        set beresp.grace = 1h;
    }
}

A few things worth calling out. We normalize trailing slashes and strip tracking parameters in vcl_recv so that /jp, /jp/, and /jp?utm_source=x all collapse to one cache key — otherwise marketing links would shatter our hit rate. We gate PURGE to loopback only, because Varnish will happily purge for anyone who can reach it. And we set grace on every object so that when a fragment expires, Varnish can serve the slightly-stale copy while it fetches a fresh one in the background. Grace mode is what keeps the origin alive during a thundering herd: only one request goes to the backend, everyone else gets the stale-but-valid fragment.

Marking which responses get ESI processing

Notice that do_esi is only enabled when the backend response carries Surrogate-Control: content="ESI/1.0". This is deliberate and it matters for security. If you blanket-enable ESI parsing for every response, an attacker who can influence any reflected content could inject an <esi:include> pointing at an internal URL — that is a real server-side request forgery class of bug. By requiring the origin to opt in per-response, only documents we control as templates ever get parsed.

The Surrogate-Control header is hop-by-hop in spirit: Varnish consumes it and strips it (we unset it) so it never leaks downstream to Cloudflare or the browser. Only the shell document sets it. The fragments themselves do not, because we do not want nested ESI parsing inside fragments — they are leaf nodes.

The shell and the fragments in PHP

The shell is a normal PHP page, but it renders almost nothing itself. It emits the Surrogate-Control header to switch on ESI, sets a long Cache-Control, and drops esi:include tags where the dynamic pieces go. The expensive work has been pushed out into the included fragments.

<?php
// public/trending.php — the shell document, cached for hours at the edge.

header('Content-Type: text/html; charset=UTF-8');
header('Surrogate-Control: content="ESI/1.0"');
header('Cache-Control: public, max-age=21600');

$region = preg_replace('/[^a-z]/', '', $_GET['region'] ?? 'jp');

?>
<!doctype html>
<html lang="ja">
<head><meta charset="utf-8"><title>Trending</title></head>
<body>
  <esi:include src="/esi/nav" />
  <main>
    <h1>Trending now</h1>
    <esi:include src="/esi/trending/<?= $region ?>" />
  </main>
  <esi:include src="/esi/counter/home" />
</body>
</html>

The fragment endpoint is where the database work lives. It runs the FTS5-backed query, renders just its own slice of HTML, and declares its own freshness. Note that this response sets a short max-age — it is the fast-moving part — while the shell above is cached for six hours. The two TTLs are completely independent.

<?php
// public/esi/trending.php — one cacheable fragment, with its own TTL.

$region = preg_replace('/[^a-z]/', '', $_GET['region'] ?? 'jp');

$db = new PDO('sqlite:/var/www/data/videos.db');
$db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

$stmt = $db->prepare(
    'SELECT v.id, v.title, v.views
     FROM trending t
     JOIN videos v ON v.id = t.video_id
     WHERE t.region = :r
     ORDER BY t.rank ASC
     LIMIT 20'
);
$stmt->execute([':r' => $region]);
$rows = $stmt->fetchAll(PDO::FETCH_ASSOC);

// Tell Varnish this fragment is good for three minutes.
header('Surrogate-Control: max-age=180');
header('Cache-Control: public, max-age=180');

foreach ($rows as $r) {
    $title = htmlspecialchars($r['title'], ENT_QUOTES, 'UTF-8');
    echo '<article data-id="' . (int) $r['id'] . '"><h3>' . $title . '</h3></article>';
}

The htmlspecialchars call with explicit UTF-8 is not optional for us — CJK titles will mojibake instantly if any layer in the chain guesses the wrong charset. Set the encoding explicitly on the header, on the meta tag, on the escaping function, and in your database connection. ESI assembly is byte-level splicing; Varnish does not re-encode anything, so every fragment must already agree on UTF-8.

A TTL budget per fragment

The whole point of this exercise is that TTLs become a per-fragment decision. Our current budget for the trending page looks like this:

/esi/nav — 6 hours. Driven by the category table; purged explicitly when an admin edits categories.
/esi/trending/{region} — 180 seconds. Recomputed every cron fetch cycle; purged after each refresh.
/esi/counter/{scope} — 30 seconds. The only thing that genuinely needs to be near-real-time.
Shell document — 6 hours. It has no data of its own, only structure.

The arithmetic is the payoff. Before fragmentation, the full page was recomputed roughly once per minute per region, so seven regions cost about 10,000 expensive renders an hour. After fragmentation, the shell renders a handful of times per region per day, the trending rail renders 20 times an hour per region, and only the tiny counter fragment runs on a 30-second cadence — and that one query is trivial. The total database work dropped by more than 90% even though the user-visible freshness of the fast parts actually improved.

Invalidation without flushing the world

Long TTLs are only safe if you can invalidate precisely when the underlying data changes. The mistake is to flush everything on every write — that throws away the cache you worked so hard to build. Varnish gives you two tools: PURGE removes one exact object, and BAN invalidates everything matching a regex. We prefer targeted purges and reserve bans for deploys.

The app calls the edge over loopback whenever a fragment's data changes. After the cron job rewrites the trending table for a region, it purges exactly that region's fragment and nothing else.

<?php
// app/Cache/EdgePurge.php — call this when a fragment's data changes.

final class EdgePurge
{
    public function __construct(
        private string $edge = 'http://127.0.0.1:6081'
    ) {}

    public function purge(string $path): bool
    {
        $ch = curl_init($this->edge . $path);
        curl_setopt_array($ch, [
            CURLOPT_CUSTOMREQUEST  => 'PURGE',
            CURLOPT_RETURNTRANSFER => true,
            CURLOPT_TIMEOUT        => 3,
        ]);
        curl_exec($ch);
        $code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
        curl_close($ch);

        return $code === 200;
    }
}

Wiring it in is one line at the end of the fetch job: after updating the trending table for Japan, call $purge->purge('/esi/trending/jp'). The nav fragment is purged from the admin category save handler. The counter fragment is never purged at all — its 30-second TTL is short enough that explicit invalidation would be pointless churn. Match the invalidation strategy to the change rate: purge the things that change on events, let TTL handle the things that change on a clock.

Warming the cache after a refresh

A purge leaves a hole. The next visitor pays the full origin cost to refill it, and during a traffic spike that visitor might be one of thousands arriving simultaneously. Grace mode covers most of this, but right after a purge there is no stale copy to serve, so we proactively re-prime the hot fragments the instant the cron job finishes writing new data.

A tiny Python script hits each fragment once, concurrently, before real traffic arrives. Because the requests go through Varnish, the first one populates the cache and every subsequent visitor is a hit.

#!/usr/bin/env python3
# warm.py - re-prime hot fragments right after a content refresh.

import concurrent.futures as cf
import urllib.request

EDGE = 'http://127.0.0.1:6081'
REGIONS = ['jp', 'kr', 'tw', 'hk', 'sg', 'th', 'vn']

def warm(path):
    req = urllib.request.Request(EDGE + path, headers={'X-Warm': '1'})
    with urllib.request.urlopen(req, timeout=5) as resp:
        return path, resp.status

def main():
    paths = ['/esi/nav']
    paths += [f'/esi/trending/{r}' for r in REGIONS]

    with cf.ThreadPoolExecutor(max_workers=8) as pool:
        for path, status in pool.map(warm, paths):
            print(f'warmed {path} -> {status}')

if __name__ == '__main__':
    main()

The X-Warm header is a marker we use in logs to distinguish synthetic warming traffic from real visitors, and you can branch on it in VCL if you want warming requests to bypass grace. The ordering matters: purge first, then warm, then real traffic. We run the warmer as the last step of the cron pipeline that produces the trending data, so the cache is hot by the time the data is even visible to users.

How this sits with Cloudflare and LiteSpeed

We did not rip out our existing stack to add Varnish — we slid it in as a dedicated assembly layer. The request path is now Cloudflare, then Varnish, then the PHP origin behind LiteSpeed. Each layer does what it is good at:

Cloudflare terminates TLS and caches the fully-assembled HTML at its edge POPs. On the free plan it cannot do ESI, which is exactly why Varnish exists in the path — assembly has to happen somewhere we control.
Varnish does the ESI stitching and holds the fragment objects in RAM. This is the brain of the operation.
LiteSpeed and PHP render the shell and the fragments. PHP no longer worries about page-level caching at all; it just sets honest Cache-Control headers per response.

The key subtlety is that downstream caches see the assembled output, not the fragments. Once Varnish splices everything together, the document that leaves it is plain HTML with no ESI tags. So the Cache-Control on the final response — which Varnish derives from the shell — is what Cloudflare honors. We keep that conservative (the shell's six hours) and rely on Varnish, sitting closer to the origin, to keep the fast fragments fresh underneath. If you let Cloudflare cache the assembled page for hours, your 30-second counter becomes a six-hour counter at the POP, so tune the public-facing TTL to your fastest visible fragment, or mark the assembled page as private and let Varnish be the shared cache.

Gotchas worth knowing before you ship

Fragments are separate requests. Cookies, auth, and request context do not automatically flow into an esi:include. If a fragment needs to be personalized, you have just left the land of shared caching — keep personalized fragments out of ESI or render them client-side.
Charset discipline is non-negotiable for CJK. UTF-8 on the header, the meta tag, the escaping call, and the DB connection. Byte-level splicing is unforgiving.
Handle include failures. A failed esi:include can blank a region of the page. Use onerror="continue" on non-critical includes so a single slow fragment does not take down the whole document.
Do not ESI everything. Fragments have overhead — each is a request and a cache object. Only split where TTLs genuinely differ. A footer and a masthead with the same TTL belong in the shell.
Lean on grace mode. It is the single most effective origin-protection setting Varnish has. Without it, every expiry is a potential stampede.

Conclusion

The shift in thinking is small but total: stop asking "how long can I cache this page?" and start asking "how long can I cache each piece of this page?" Varnish plus ESI lets you answer that question independently for every fragment, assemble the document at the edge, and serve crawlers and users a single finished page. For us it turned a page that fell over at peak into one that barely registers on the origin, while making the live parts fresher than before. If you are running a read-heavy, multi-region aggregator on a modest origin, fragment-level edge caching is the highest-leverage change you can make without rewriting a line of your application logic.

DEV Community