Alexey Baltacov for AWS Community Builders

Posted on May 6

Precision Canary Deployments for Static Content: Navigating High-Stakes Tech Migrations

#aws #cloudfront #s3 #canary

Precision Canary Deployments for Static Content: Beyond the "Big Bang" Release

Big frontend migrations rarely fail because of one obvious bug.

They fail because production is messy.

A move from a legacy React SPA to a Next.js static build, a routing-model change, or a full redesign can look straightforward on paper because the site is "just static files." In production, the real risk is the combination of browser state, CDN caching, routing behavior, and customer sessions already in flight.

That is where a precision canary earns its keep.

Instead of sending 50% of traffic to a new build and hoping for the best, you can start with 0.1% or even 0.01%, observe what happens under real conditions, and expand only when the evidence says you should.

For engineers, that cuts blast radius. For executives, it turns a risky migration into a controlled rollout with a fast rollback path.

Executive summary

A precision canary for static content does four important things:

It keeps the initial risk very small.
It makes the routing decision before the cache lookup, so stable and canary traffic stay separated.
It avoids cookies by using an anonymous, deterministic routing key such as JA4, then JA3, with a final fallback when neither is available.
It allows instant rollback by changing one edge decision instead of waiting for DNS behavior to settle.

Why static-site migrations still fail in production

Staging environments are useful, but they are not production. They do not contain months of real browser state, cached assets, half-complete sessions, or every unusual client combination that appears on the public internet.

Here are the common ways large static-site releases fail.

1. Hidden client state

Real users arrive with old service workers, outdated cached files, old localStorage data, and long-lived tabs. A new build can be technically correct and still break when it meets that real-world state.

2. Session fragmentation

If a user lands on the new version for one request and the old version for the next, state and routing assumptions can collide. The result is often a broken session, confusing UI behavior, or JavaScript errors that never appeared in testing.

3. Routing surprises

A migration from hash routing to history-based routing changes the rules. A page that works during client-side navigation can still fail on refresh if the origin or edge does not return the expected fallback document.

4. Business impact arrives before technical certainty

When a public site supports revenue, leads, sign-ins, or brand trust, the cost of a bad release is not just a higher bug count. It is lost conversion, increased support load, and rollback stress at the worst possible moment.

Why DNS weighting is not the best fit here

A common first idea is to split traffic at the DNS layer with Route 53 weighted records.

That can work in some scenarios, but it is a weak fit for high-stakes frontend migrations because it sits outside the request path, where the decision really needs to happen.

Route 53 weighting

Pros

Easy to understand
Native AWS capability
Fine for coarse traffic shifts

Cons

Rollback is slower because DNS behavior is not instantly consistent everywhere
It cannot easily support tester overrides with headers
It does not naturally keep a single viewer pinned to a deterministic version during the migration

Why CloudFront Functions is the better control point

With CloudFront Functions, the routing decision happens at the edge on the request itself.

That gives you better operational control:

Instant rollback by changing edge logic or configuration
Very fine rollout precision using basis points
QA and executive preview paths through explicit overrides
Better routing consistency without relying on cookies

Important trade-off: AWS notes that CloudFront Functions run on every viewer request, while Lambda@Edge origin-request logic runs only on cache misses. For highly cacheable workloads, Lambda@Edge can be more cost-efficient. For this article, the recommendation stays with CloudFront Functions because the main goal is fast, precise rollout control during a risky migration, not minimizing function invocations.

Architecture at a glance

The pattern is straightforward:

Keep one CloudFront distribution.
Define two origins: stable and canary.
Run a viewer-request CloudFront Function.
Decide which origin to use before the cache lookup.
Add a small header such as x-canary-origin so the cache key stays isolated between stable and canary objects.

This matters because without cache isolation, CloudFront can serve a cached canary object to a stable user, or vice versa.

The routing principle: deterministic, anonymous, cookie-free

For this use case, the goal is not personal identity. The goal is consistent traffic assignment.

A good order of precedence is:

Explicit header override for QA or controlled previews
Explicit query override for one-off testing
JA4 fingerprint when available
JA3 fingerprint if JA4 is not available
Fallback hash using user-agent + viewer.ip

That gives you a cookie-free strategy that is still stable enough for production rollout control.

This article assumes the site is always served over HTTPS. That matters because JA4 and JA3 are derived from the TLS handshake and are only available on HTTPS requests.

Practical decision flow

Why basis points matter

Percentages are often too coarse.

For a large migration, 1% can already represent a meaningful amount of real traffic. Using basis points (BPS) gives you a much finer dial:

1 BPS = 0.01%
10 BPS = 0.1%
100 BPS = 1%

That lets you run a true smoke test in production before the rollout is visible to a meaningful share of customers.

CloudFront Function example

The following function shows the core pattern:

manual overrides first
deterministic cookie-free routing second
cache isolation always
stable fallback when viewer fingerprints are missing

Important: the sample below assumes JavaScript runtime 2.0 because origin selection helper methods require it.

import cf from 'cloudfront';

const CONFIG = {
  OLD_ORIGIN_ID: 'origin-production',
  NEW_ORIGIN_ID: 'origin-canary',

  // Basis points: 1 = 0.01%, 10 = 0.1%, 100 = 1%
  CANARY_BPS: 10
};

function getHeaderValue(headers, name) {
  var header = headers[name.toLowerCase()];
  return header && header.value ? header.value : '';
}

// ES5-safe deterministic 32-bit hash for bucketing
function stableHash(value) {
  var hash = 2166136261;
  for (var i = 0; i < value.length; i++) {
    hash ^= value.charCodeAt(i);
    hash += (hash << 1) + (hash << 4) + (hash << 7) + (hash << 8) + (hash << 24);
  }
  return hash >>> 0;
}

function pickRoutingKey(request, viewer) {
  var headers = request.headers;

  // Prefer JA4, then JA3. Both are anonymous TLS-derived fingerprints.
  var ja4 = getHeaderValue(headers, 'cloudfront-viewer-ja4-fingerprint');
  if (ja4) return 'ja4:' + ja4;

  var ja3 = getHeaderValue(headers, 'cloudfront-viewer-ja3-fingerprint');
  if (ja3) return 'ja3:' + ja3;

  // Final fallback: still deterministic, but less stable than JA4/JA3
  var userAgent = getHeaderValue(headers, 'user-agent');
  var ip = (viewer && viewer.ip) ? viewer.ip : 'unknown';
  return 'uaip:' + userAgent + '|' + ip;
}

function applyManualOverride(request) {
  var headers = request.headers;
  var querystring = request.querystring || {};

  // 1) QA header override
  var qaVersion = getHeaderValue(headers, 'x-qa-version');
  if (qaVersion === 'canary') return true;
  if (qaVersion === 'stable') return false;

  // 2) One-off query override
  if (querystring.canary) {
    var forceCanary = querystring.canary.value === '1';

    // Remove the test flag so it does not accidentally affect cache behavior
    delete request.querystring.canary;

    return forceCanary;
  }

  return null;
}

function handler(event) {
  var request = event.request;
  var viewer = event.viewer || {};

  var useNewOrigin = applyManualOverride(request);

  if (useNewOrigin === null) {
    var routingKey = pickRoutingKey(request, viewer);
    var bucket = stableHash(routingKey) % 10000;
    useNewOrigin = bucket < CONFIG.CANARY_BPS;
  }

  // Keep stable and canary cache entries isolated
  request.headers['x-canary-origin'] = {
    value: useNewOrigin ? 'new' : 'old'
  };

  if (useNewOrigin) {
    cf.selectRequestOriginById(CONFIG.NEW_ORIGIN_ID);
  } else {
    cf.selectRequestOriginById(CONFIG.OLD_ORIGIN_ID);
  }

  return request;
}

What this code gets right

This version is designed to align with the operational goals in this article.

It does not use cookies

That keeps the mechanism lightweight and avoids introducing a separate persistence layer just to keep users on the same version.

It prefers the most stable anonymous signal available

JA4 first, then JA3, gives you a better routing key than raw IP alone. When those fingerprints are unavailable, the function still behaves predictably.

Header-name note: AWS documentation presents these as CloudFront-Viewer-JA4-Fingerprint and CloudFront-Viewer-JA3-Fingerprint, but inside a CloudFront Functions event object you access them in lowercase as cloudfront-viewer-ja4-fingerprint and cloudfront-viewer-ja3-fingerprint.

It keeps cache entries separated

The x-canary-origin request header is the critical separator. Include this header in the cache key so stable and canary content never share the same cached object.

It supports controlled previews

Executives, QA, or launch managers can view the canary safely using a header or a one-off query switch without changing the default rollout.

Required distribution configuration

The code is only part of the solution. The distribution settings matter just as much.

1. Two origins

Define one origin for the current production build and one for the canary build.

2. Viewer-request CloudFront Function

Associate the function with the cache behavior that serves the site.

3. Enable the CloudFront-generated fingerprint headers

CloudFront needs to add the viewer fingerprint headers so the function can inspect them:

CloudFront-Viewer-JA4-Fingerprint
CloudFront-Viewer-JA3-Fingerprint

These headers are available to CloudFront Functions and Lambda@Edge, but only for HTTPS requests. AWS also notes that these TLS-related headers can be added to an origin request policy, but not to a cache policy.

4. Cache policy includes `x-canary-origin`

This is what keeps stable and canary cache objects isolated.

5. Query-string handling is deliberate

If you allow the ?canary=1 switch for testing, either:

remove it in the function, as shown above, or
make sure it is not part of the cache key

6. SPA or app-style routing fallback is explicit

If the migration changes routing behavior, make sure refreshes and deep links return the correct document.

7. Validate viewer fingerprint availability in your environment

JA3 and JA4 are excellent routing inputs when exposed to the function path, but you should validate their availability in your exact CloudFront setup before making them your only signal.

A rollout plan that works for both engineers and executives

A practical rollout plan looks like this:

Stage 1: 0.1% smoke test

Purpose: prove routing, cache isolation, and basic stability.

Watch for:

unexpected 404s
JavaScript startup failures
broken asset loading
obvious support noise

Stage 2: 1% early validation

Purpose: confirm that the new build survives real traffic patterns.

Watch for:

client-side error rates
performance regressions
login or session complaints
routing edge cases

Stage 3: 5% confidence build

Purpose: compare business and operational metrics against the stable path.

Watch for:

conversion rate
bounce rate
task completion
error budgets
support tickets

Stage 4: 25% operational proof

Purpose: verify team readiness, dashboards, alerts, and rollback confidence under broader load.

Stage 5: 100% cutover

Promote only when the migration risk is low enough for the new path to become the default.

What to measure during the canary

For technical teams:

404 rate by path
JavaScript error rate
asset load failures
Core Web Vitals or equivalent performance signals
origin error rate and CDN cache behavior

For business stakeholders:

conversion or lead generation
checkout or sign-in completion
bounce rate
support ticket volume
incident count during the rollout window

Important caveats

JA3 and JA4 are not identity

They are useful routing signals, not person-level identifiers. Their role here is to keep similar requests on the same side of the rollout decision, not to identify an individual user.

HTTPS matters

JA3 and JA4 come from the TLS Client Hello, so they are relevant only for HTTPS traffic.

The cache key is the safety line

If you remember only one implementation detail, remember this one: do not mix stable and canary objects in the same cache key.

Keep rollback boring

The best rollout is the one that can be reversed in seconds without debate. Edge-based selection gives you that option.

Final takeaway

A canary deployment for static content is not just a safer release tactic.

For major web migrations, it is a way to separate technical uncertainty from business risk.

By making the decision at the edge, using deterministic cookie-free routing, and isolating cache entries correctly, you can move from a fragile big-bang launch to a controlled rollout that both engineers and executives can support.

That is the real value: fewer surprises, faster rollback, and better evidence before full cutover.

Acknowledgments

Special thanks to @Moti Moskovich for his contribution to this article.

Precision Canary Deployments for Static Content: Beyond the "Big Bang" Release

Executive summary

Why static-site migrations still fail in production

1. Hidden client state

2. Session fragmentation

3. Routing surprises

4. Business impact arrives before technical certainty

Why DNS weighting is not the best fit here

Route 53 weighting

Why CloudFront Functions is the better control point

Architecture at a glance

The routing principle: deterministic, anonymous, cookie-free

Practical decision flow

Why basis points matter

CloudFront Function example

What this code gets right

It does not use cookies

It prefers the most stable anonymous signal available

It keeps cache entries separated

It supports controlled previews

Required distribution configuration

1. Two origins

2. Viewer-request CloudFront Function

3. Enable the CloudFront-generated fingerprint headers

4. Cache policy includes x-canary-origin

5. Query-string handling is deliberate

6. SPA or app-style routing fallback is explicit

7. Validate viewer fingerprint availability in your environment

A rollout plan that works for both engineers and executives

Stage 1: 0.1% smoke test

Stage 2: 1% early validation

Stage 3: 5% confidence build

Stage 4: 25% operational proof

Stage 5: 100% cutover

What to measure during the canary

Important caveats

JA3 and JA4 are not identity

HTTPS matters

The cache key is the safety line

Keep rollback boring

Final takeaway

Acknowledgments

References

4. Cache policy includes `x-canary-origin`