Nacho González

Posted on Apr 24 • Edited on May 6 • Originally published at qrcodenova.com

Dynamic QR Code Redirect Architecture at the Edge

#webdev #architecture #cloudflare #performance

A dynamic QR code is a short URL baked into a static image. The "dynamic" part is a database record: a slug-to-destination mapping you can update without reprinting. The image never changes.

Everything interesting happens in the redirect layer. Every scan is a live HTTP request that has to resolve before a user moves on. What follows is what that layer looks like, how to build it right, and where it breaks.

What "dynamic" actually means

The QR image encodes a fixed string like go.yourcompany.com/r/abc123. What changes is the database record mapping abc123 to a destination URL.

The reliability of every printed QR code is exactly equal to the reliability of that redirect server. Nothing in the image acts as a fallback. Slow server = slow scan. Down server = failed scan. Cancelled account = error page. The infrastructure is the product.

The redirect chain

Full sequence from camera tap to destination:

Scan: device reads the QR pattern, decodes the short URL
DNS resolution: resolves the redirect domain to nearest IP (edge) or one fixed IP (single-origin)
TCP + TLS handshake: 50–150ms cold, depending on distance and network
HTTP GET: GET /r/abc123 hits the redirect server
Lookup: server checks cache or KV store for the destination URL mapped to that slug
Async log: queues scan event (timestamp, IP, User-Agent) without blocking the response
HTTP 302: returns Location: <destination>
Browser follows the redirect
Destination loads

Steps 3–7 are what the redirect infrastructure controls. That window is what edge computing compresses.

Single-origin: how most platforms are built

Simplest implementation: one server in Virginia or Frankfurt. Every scan in the world hits that box.

Two problems surface at scale.

Latency first. Sydney to Virginia is about 180ms round-trip just to receive a 200-byte redirect response. On a slow mobile network where TCP setup already costs 100ms, you're at 300ms before the destination even starts loading.

Then there's the single point of failure. A bad deploy, a DDoS, or a data center incident takes every QR code on the platform offline at once. No second region to fail over to.

Most small QR platforms run exactly this. Fine at low volume. Problems show up at scale, and by then codes are already printed.

Edge-first architecture

Edge-first moves redirect logic to globally distributed nodes. Cloudflare Workers and Fastly Compute run JS/Wasm at 200+ cities. DNS resolves via anycast to the nearest edge node.

TTFB for Sydney to nearest edge: 5–20ms instead of 150–200ms.

At the node, the lookup reads from an in-memory cache or distributed KV. Cloudflare Workers KV replicates writes globally within ~60s and reads with single-digit millisecond latency from any node. One KV read, not a database query with joins.

Minimal Cloudflare Worker redirect handler:

export default {
  async fetch(request, env, ctx) {
    const slug = new URL(request.url).pathname.slice(3); // strip /r/
    const destination = await env.REDIRECTS.get(slug);

    if (!destination) {
      return new Response("Not found", { status: 404 });
    }

    // fire-and-forget, doesn't block the redirect
    ctx.waitUntil(logScan(request, slug, env));

    return Response.redirect(destination, 302);
  },
};

async function logScan(request, slug, env) {
  await env.SCAN_QUEUE.send({
    slug,
    ip: request.headers.get("CF-Connecting-IP"),
    country: request.headers.get("CF-IPCountry"),
    ua: request.headers.get("User-Agent"),
    ts: Date.now(),
  });
}

Numbers

	Single-origin	Edge
TTFB (same region as server)	20–40ms	5–15ms
TTFB (opposite side of world)	150–250ms	15–30ms
P99 under traffic spike	800ms–2s+	30–60ms
Regional outage	100% of scans fail	auto-failover, zero user impact
Cost at scale	Vertical scale-out	Per-request

302, not 301

Always use 302 (temporary redirect), not 301 (permanent).

301 gets cached aggressively by browsers. If a user previously scanned and their browser cached the old destination, updating the database does nothing; they get the old URL until their cache expires. 302 tells browsers not to cache, so every scan hits the redirect server fresh.

This is the most common redirect architecture mistake. Invisible during testing, only shows up for repeat visitors in production.

Caching at the edge

Not every destination changes frequently. Cache the slug-to-destination mapping at the edge to skip the KV read on hot slugs.

TTL of 30–120s covers most use cases. Fast enough for campaign changes, cheap at scale.

Stale-while-revalidate serves the cached destination immediately and refreshes in the background:

export default {
  async fetch(request, env, ctx) {
    const slug = new URL(request.url).pathname.slice(3);
    const cacheKey = `dest:${slug}`;

    const cached = await env.CACHE.get(cacheKey);
    if (cached) {
      // return immediately, refresh cache in background
      ctx.waitUntil(
        fetchFromOrigin(slug, env).then((dest) =>
          dest ? env.CACHE.put(cacheKey, dest, { expirationTtl: 60 }) : null
        )
      );
      return Response.redirect(cached, 302);
    }

    const destination = await fetchFromOrigin(slug, env);
    if (!destination) return new Response("Not found", { status: 404 });

    ctx.waitUntil(env.CACHE.put(cacheKey, destination, { expirationTtl: 60 }));
    ctx.waitUntil(logScan(request, slug, env));

    return Response.redirect(destination, 302);
  },
};

Users never wait for cache misses. The miss penalty is invisible.

Geo-routing at the edge

Some use cases need different destinations per location. A global brand might send US users to a US landing page and EU users to a different one with localized copy and compliance language. A restaurant chain might send users to the nearest location's menu.

Edge workers get geolocation headers on every request, no external API call needed. On Cloudflare, CF-IPCountry gives you the ISO country code. Store a slug:country key in KV and do the lookup in one read.

export default {
  async fetch(request, env, ctx) {
    const slug = new URL(request.url).pathname.slice(3);
    const country = request.headers.get("CF-IPCountry") ?? "XX";

    // try geo-specific destination first, fall back to default
    const destination =
      (await env.REDIRECTS.get(`${slug}:${country}`)) ??
      (await env.REDIRECTS.get(slug));

    if (!destination) {
      return new Response("Not found", { status: 404 });
    }

    ctx.waitUntil(logScan(request, slug, country, env));
    return Response.redirect(destination, 302);
  },
};

KV key structure:

abc123 → default destination
abc123:US → US destination
abc123:DE → German destination

One KV read for geo-specific, two reads for fallback. Still single-digit millisecond latency, no round-trip to a central routing service.

It's more useful than it looks. Regional A/B tests, localized landing pages, GDPR compliance redirects: all of it becomes a KV write from the dashboard instead of a deploy.

Analytics: async or you're doing it wrong

Writing to a database synchronously before returning the redirect is the worst architecture choice here. It puts a write operation (lock contention, index updates, network round-trip to a central DB) directly in the hot path of every single scan.

Two patterns that work:

Queue-based: the Worker sends a lightweight event to Cloudflare Queues or SQS and returns the redirect immediately. A separate consumer drains the queue, enriches events with geo-lookup and device parsing, and writes to the analytics store. If the pipeline backs up or errors, scans keep working.

Edge streaming: log events go to Cloudflare Logpush, then object storage, then Kafka or Kinesis, then the analytics DB in batches. More infrastructure, but scales to millions of scans per day without per-event writes.

Same principle either way: redirect response latency stays fixed. Analytics write latency is irrelevant to users.

What to monitor in production

A redirect service has a short list of things worth tracking.

Redirect TTFB by region: track P95 and P99, not average. Average hides tail latency that shows up for users on slow mobile networks. If P99 in Asia spikes to 800ms, something is wrong with KV replication or a regional node.

Cache hit rate: a sudden drop means something invalidated the edge cache. Catch it before it becomes a latency regression.

Scan error rate: 404s are expected for deleted slugs. 500s, timeouts, and connection resets need alerting. One bad deploy should not silently fail millions of scans.

Queue depth: if the scan event queue is backing up, the analytics pipeline has a problem. The redirect still works, but lag accumulates. Left long enough, you'll either drop events or overwhelm the consumer when it catches up.

In Cloudflare Workers, Workers Analytics Engine is a time-series store built for this pattern. Push a data point per request, query with SQL-like syntax via the Analytics Engine API.

// inside fetch handler, non-blocking
ctx.waitUntil(
  env.ANALYTICS.writeDataPoint({
    blobs: [slug, country, request.headers.get("CF-Ray") ?? ""],
    doubles: [Date.now()],
    indexes: [slug],
  })
);

Query it later:

SELECT
  blob1 AS slug,
  blob2 AS country,
  count() AS scans,
  quantilesMerge(0.95)(quantilesState(0.95)(double1)) AS p95_ts
FROM SCAN_EVENTS
WHERE timestamp > NOW() - INTERVAL '1' HOUR
GROUP BY slug, country
ORDER BY scans DESC

One endpoint. No external observability stack in the hot path.

Failover

When a node goes unhealthy, anycast DNS shifts requests to the next-nearest healthy one automatically. Latency ticks up slightly; the service stays up. With single-origin, one failure is a complete outage.

The resilience pattern most people skip: stale-on-error. If the origin data store is unreachable, serve the last-cached destination instead of returning an error. Destination changes are rare; the cached value is almost always right. A slightly stale redirect beats an error page every time.

The part that's not a performance problem

The short URL encoded in a printed QR code is permanent. Once it's on packaging, signage, or business cards, you can't change it without reprinting.

If that URL is qrtiger.io/r/abc123, every printed code's operational status is permanently tied to QR Tiger. Cancel the subscription and you get error pages. Platform shuts down, error pages. Price increase, you negotiate from zero leverage.

Owning the redirect domain fixes this. go.yourcompany.com/r/abc123 can point at any infrastructure at any time. Update a DNS record to switch platforms or self-host. The printed codes never change; what's behind them does.

Most QR platforms charge a premium for custom domains or don't support them at all. That's not incidental. A platform hosting your redirect domain has permanent leverage over materials that cost real money to replace.

Build vs. buy

A self-hosted redirect on Cloudflare Workers costs about $5/month for the Workers subscription plus under $0.50/million redirect reads. The Worker is 50–100 lines. A few hours to deploy.

What a platform adds: destination management UI, analytics dashboard, QR generation tooling, reliability guarantees. For teams without dedicated infra engineers, that management layer is the actual product. That's probably why most teams should buy rather than build.

Either way: edge compute for the redirect hop, analytics out of the hot path, short-TTL caching with stale-while-revalidate, stale-on-error resilience, and a redirect domain you control.

The QR image is just the entry point. The redirect layer is the product.

I'm building QR Nova, a QR platform built on this architecture. Happy to answer questions in the comments.