Edge Caching Strategies to Cut Latency and Cost

#platform

Why edge caching changes the latency equation
Cache-Control and TTL patterns to make behavior predictable
Surrogate keys and targeted invalidation workflows
Measuring cache ROI and controlling cost
A practical checklist and runbook for edge cache policies
Sources

Edge caching is the fastest, cheapest lever you have to cut user-visible latency; misconfigured caching is the stealthiest source of both poor UX and runaway origin cost. I draw on running high-traffic edge platforms to give you exact patterns—Cache-Control composition, sensible TTLs, stale-while-revalidate, and surrogate-key invalidation—that move latency off the critical path and shrink bills.

You see this in audits: spikes in P95/P99 latency that coincide with cache misses, dashboards that show rising origin RPS, teams purging entire CDNs after content updates, and exploding numbers of cache keys because headers and query strings vary unpredictably. Those symptoms are operational signals: cache exists, but it isn’t shaping application behavior, and the result is poor UX plus avoidable origin cost.

Why edge caching changes the latency equation

Edge caches collapse geographic and network distance. Serving the same object from a nearby POP instead of the origin reduces round-trip time dramatically and removes origin compute from the request path for cache hits. The proportion of requests served from edge caches—cache hit ratio—directly controls origin load and therefore both latency tail behavior and egress bills.

Designing cache keys is primary: every header, cookie, or query parameter you include in the cache key fragments the cache and reduces hit ratio. Shared-cache directives like s-maxage let you treat the CDN differently from the browser, which is how you get the best of both: long-lived edge responses with conservative browser revalidation.

Important: small, repeatable improvements in hit ratio compound—moving from a 70% to an 85% edge hit ratio reduces origin traffic dramatically and reduces tail latency for the user cohorts that matter most.

Measure and segment hit ratio by URL prefixes, by client region, and by device type so you know where fragmentation happens. Treat the cache key the way you treat authentication logic: explicit, reviewed, and instrumented.

Cache-Control and TTL patterns to make behavior predictable

Get deliberate with Cache-Control. The directives you pick are your contract with every cache in the path:

max-age controls client-side freshness.
s-maxage overrides max-age for shared caches (CDNs), letting you decouple browser and edge lifetimes.
stale-while-revalidate and stale-if-error allow controlled staleness while hiding origin latency or failures. stale-while-revalidate is standardized behavior for serving a stale response immediately while revalidation happens in the background.
immutable is useful for fingerprinted assets to tell caches that the response never changes until its URL does.

Practical header patterns (examples):

# Fingerprinted/static assets
Cache-Control: public, max-age=31536000, immutable

# HTML or SSR pages (edge-first, browser revalidate immediately)
Cache-Control: public, max-age=0, s-maxage=60, stale-while-revalidate=30

# API responses that tolerate short staleness
Cache-Control: public, max-age=5, s-maxage=30, stale-while-revalidate=10, stale-if-error=86400

Use s-maxage for edge-first behaviors and max-age for what clients should keep locally. Use stale-while-revalidate to avoid blocking requests during revalidation windows and to collapse bursts of traffic into a single origin fetch (the cache will return stale while a background validation occurs).

Contrarian operational insight: prefer a slightly longer shared-cache TTL with a short browser TTL and targeted invalidation, rather than short TTLs everywhere. Short TTLs shift cost and unpredictability back to your origin; targeted invalidation (surrogate keys / tags) preserves freshness without paying for constant origin traffic.

Surrogate keys and targeted invalidation workflows

When you need freshness on updates, avoid “purge everything.” Tag related responses at the origin so you can invalidate narrowly. Two common implementations:

Fastly-style Surrogate-Key headers that index responses against keys at the edge; you purge by key via API.
Cloudflare-style Cache-Tag headers that let you purge by tag (or purge by prefix/host for other use cases).

Example: tag a product page and all listing pages that include it:

Cache-Control: max-age=86400
Surrogate-Key: product-62952 category-shoes

Purge-by-key examples (illustrative curl requests):

# Fastly - batch surrogate-key purge (JSON body)
curl -X POST "https://api.fastly.com/service/<SERVICE_ID>/purge" \
  -H "Fastly-Key: ${FASTLY_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{"surrogate_keys":["product-62952","category-shoes"]}'

# Cloudflare - purge by tag
curl -X POST "https://api.cloudflare.com/client/v4/zones/<ZONE_ID>/purge_cache" \
  -H "Authorization: Bearer ${CF_API_TOKEN}" \
  -H "Content-Type: application/json" \
  --data '{"tags":["product-62952","category-shoes"]}'

Operational considerations and limits: surrogate/tag headers have size limits and practical key-count limits; large, unbounded sets of tags cause header bloat and parsing problems. Fastly documents header-length limits and Cloudflare documents tag-size/aggregate limits—design keys to be short, stable, and namespaced.

Design rules that have worked repeatedly in large systems:

Use composite, normalized keys (e.g., product:62952) rather than embedding free text.
Tag both canonical URLs and the derived representations (e.g., mobile/desktop variants) so you can invalidate a single logical object.
Emit tags from the origin at render time to keep tagging consistent and avoid prerendering mistakes.
Batch and throttle purge API calls from CMS/webhooks to avoid rate-limit cliffs and origin storms.

Measuring cache ROI and controlling cost

Measurement is where caching goes from "hope" to "ROI." Track these baseline metrics with daily resolution: edge hit ratio, origin requests per second (RPS), origin egress (GB), average object size, and latency percentiles (P50/P95/P99).

Compute a simple monthly savings estimate:

Baseline origin egress (GB) = total origin requests * average payload size (GB)
Estimated saved egress = Baseline * (delta in hit ratio)
Cost savings = Estimated saved egress * origin egress price per GB

Example calculation (illustrative):

10 million monthly requests, average payload 50 KB → ~476 GB baseline
Increase hit ratio so origin requests fall by 20% → ~95 GB saved
At $0.09/GB, monthly saving ≈ $8.55; multiply by larger payloads or request volumes and savings scale quickly.

Also track business-impact metrics: conversion rate by geography and median time-to-first-byte for pages that are most visible to customers. Use these to prioritize which cache policies to tighten or which parts to tag.

Quick comparison table of TTL patterns and trade-offs:

Pattern	Typical use	Edge TTL example	Browser TTL example	Benefit	Risk
Fingerprinted static	JS/CSS/images with content-hash	`max-age=31536000`	`max-age=31536000, immutable`	Maximize cache efficiency	None if fingerprinting is correct
Edge-first HTML	Pages that tolerate short staleness	`s-maxage=60, stale-while-revalidate=30`	`max-age=0`	Low P95 latency; controlled freshness	Short window risk if revalidation fails
API short-stale	Read-heavy APIs tolerant of slight staleness	`s-maxage=30, stale-while-revalidate=10`	`max-age=0`	Reduced origin RPS	Staleness must be acceptable
No-cache/private	Authenticated or sensitive data	`no-store`	`no-store`	Prevents stale sensitive data	Always origin-bound → higher latency/cost

Cloud CDN vendors themselves document the direct relationship between cache hit ratio and origin requests, and recommend policies like s-maxage + revalidation and features like Origin Shield to reduce origin fetches. Use those vendor signals to prioritize changes.

A practical checklist and runbook for edge cache policies

Checklist — audit and baseline (first 72 hours)

Collect last 30 days of logs: edge hit ratio, origin RPS, top 1,000 origin-requested URLs, average payload sizes by URL.
Identify top contributors to origin traffic and rank by business impact (revenue, pageviews).
Classify content into buckets: fingerprinted static, semi-static (catalog pages), dynamic per-user, and APIs.
Map current Cache-Control settings and cache-key dimensions (query strings, headers, cookies).

Checklist — policy rollout

For fingerprinted assets: deploy Cache-Control: public, max-age=31536000, immutable.
For semi-static pages: set s-maxage with stale-while-revalidate and tag responses with Surrogate-Key/Cache-Tag.
Implement purge-by-key hooks in the CMS or content pipeline; batch and rate-limit the purge calls.
Add monitoring: dashboards for hit ratio, origin RPS, egress GB, and latency. Set alerts for sudden drops in hit ratio or quick RPS increases.

Runbook — urgent invalidation (step-by-step)

Identify the minimal set of keys/tags affected by the change (product IDs, page slugs).
Issue a targeted purge-by-key or purge-by-tag call using the documented API (use batch where possible).
Verify a successful purge by requesting representative URLs and examining edge headers (e.g., X-Cache, CF-Cache-Status, Fastly-Debug) to confirm MISS then re-fill.
Monitor origin RPS and CPU. When origin traffic rises unexpectedly, pause non-critical purge batches and allow the cache to refill gradually.
If rollback is necessary, serve stale content while revalidations stabilize by ensuring stale-while-revalidate and stale-if-error are enabled for critical endpoints.

Automations and safety nets

Implement a purge queue that enforces per-minute quotas and exponential backoff on repeated failures.
Emit purge audits (who triggered, keys, timestamp) to a centralized log for post-mortem and cost allocation.
Use feature flags or percentage rollouts when changing cache-key composition or a global TTL policy.

Start with a short list of high-impact pages: get measurable hit-ratio improvement for those pages, observe origin egress change, then scale your policies. The work is incremental; measurable improvements come quickly when you stop fragmenting the cache and start invalidating surgically.

Sources

Cache-Control - HTTP | MDN Web Docs - Reference for Cache-Control, s-maxage, immutable, no-store, and practical examples of header composition.

RFC 5861 — HTTP Cache-Control Extensions for Stale Content - Formal specification of stale-while-revalidate and stale-if-error, with behavior expectations for caches.

Keeping things fresh with stale-while-revalidate | web.dev - Practical guidance and trade-offs for stale-while-revalidate on web applications.

Surrogate-Key | Fastly Documentation - Explanation of the Surrogate-Key header, indexing, purging by key, and header-size limits.

Purge cache by cache-tags · Cloudflare Cache (CDN) docs - Details on Cache-Tag usage, purge-by-tag workflow, limits, and API examples.

Increase the proportion of requests that are served directly from the CloudFront caches (cache hit ratio) - Amazon CloudFront Documentation - Definitions of cache hit ratio, advice on increasing hit ratio, and origin-cost reduction mechanisms.