Precision Canary Deployments for Static Content: Beyond the "Big Bang" Release
Big frontend migrations rarely fail because of one obvious bug.
They fail because production is messy.
A move from a legacy React SPA to a Next.js static build, a routing-model change, or a full redesign can look straightforward on paper because the site is "just static files." In production, the real risk is the combination of browser state, CDN caching, routing behavior, and customer sessions already in flight.
That is where a precision canary earns its keep.
Instead of sending 50% of traffic to a new build and hoping for the best, you can start with 0.1% or even 0.01%, observe what happens under real conditions, and expand only when the evidence says you should.
For engineers, that cuts blast radius. For executives, it turns a risky migration into a controlled rollout with a fast rollback path.
Executive summary
A precision canary for static content does four important things:
- It keeps the initial risk very small.
- It makes the routing decision before the cache lookup, so stable and canary traffic stay separated.
- It avoids cookies by using an anonymous, deterministic routing key such as JA4, then JA3, with a final fallback when neither is available.
- It allows instant rollback by changing one edge decision instead of waiting for DNS behavior to settle.
Why static-site migrations still fail in production
Staging environments are useful, but they are not production. They do not contain months of real browser state, cached assets, half-complete sessions, or every unusual client combination that appears on the public internet.
Here are the common ways large static-site releases fail.
1. Hidden client state
Real users arrive with old service workers, outdated cached files, old localStorage data, and long-lived tabs. A new build can be technically correct and still break when it meets that real-world state.
2. Session fragmentation
If a user lands on the new version for one request and the old version for the next, state and routing assumptions can collide. The result is often a broken session, confusing UI behavior, or JavaScript errors that never appeared in testing.
3. Routing surprises
A migration from hash routing to history-based routing changes the rules. A page that works during client-side navigation can still fail on refresh if the origin or edge does not return the expected fallback document.
4. Business impact arrives before technical certainty
When a public site supports revenue, leads, sign-ins, or brand trust, the cost of a bad release is not just a higher bug count. It is lost conversion, increased support load, and rollback stress at the worst possible moment.
Why DNS weighting is not the best fit here
A common first idea is to split traffic at the DNS layer with Route 53 weighted records.
That can work in some scenarios, but it is a weak fit for high-stakes frontend migrations because it sits outside the request path, where the decision really needs to happen.
Route 53 weighting
Pros
- Easy to understand
- Native AWS capability
- Fine for coarse traffic shifts
Cons
- Rollback is slower because DNS behavior is not instantly consistent everywhere
- It cannot easily support tester overrides with headers
- It does not naturally keep a single viewer pinned to a deterministic version during the migration
Why CloudFront Functions is the better control point
With CloudFront Functions, the routing decision happens at the edge on the request itself.
That gives you better operational control:
- Instant rollback by changing edge logic or configuration
- Very fine rollout precision using basis points
- QA and executive preview paths through explicit overrides
- Better routing consistency without relying on cookies
Important trade-off: AWS notes that CloudFront Functions run on every viewer request, while Lambda@Edge origin-request logic runs only on cache misses. For highly cacheable workloads, Lambda@Edge can be more cost-efficient. For this article, the recommendation stays with CloudFront Functions because the main goal is fast, precise rollout control during a risky migration, not minimizing function invocations.
Architecture at a glance
The pattern is straightforward:
- Keep one CloudFront distribution.
- Define two origins: stable and canary.
- Run a viewer-request CloudFront Function.
- Decide which origin to use before the cache lookup.
- Add a small header such as
x-canary-originso the cache key stays isolated between stable and canary objects.
This matters because without cache isolation, CloudFront can serve a cached canary object to a stable user, or vice versa.
The routing principle: deterministic, anonymous, cookie-free
For this use case, the goal is not personal identity. The goal is consistent traffic assignment.
A good order of precedence is:
- Explicit header override for QA or controlled previews
- Explicit query override for one-off testing
- JA4 fingerprint when available
- JA3 fingerprint if JA4 is not available
-
Fallback hash using
user-agent + viewer.ip
That gives you a cookie-free strategy that is still stable enough for production rollout control.
This article assumes the site is always served over HTTPS. That matters because JA4 and JA3 are derived from the TLS handshake and are only available on HTTPS requests.
Practical decision flow
Why basis points matter
Percentages are often too coarse.
For a large migration, 1% can already represent a meaningful amount of real traffic. Using basis points (BPS) gives you a much finer dial:
1 BPS = 0.01%10 BPS = 0.1%100 BPS = 1%
That lets you run a true smoke test in production before the rollout is visible to a meaningful share of customers.
CloudFront Function example
The following function shows the core pattern:
- manual overrides first
- deterministic cookie-free routing second
- cache isolation always
- stable fallback when viewer fingerprints are missing
Important: the sample below assumes JavaScript runtime 2.0 because origin selection helper methods require it.
import cf from 'cloudfront';
const CONFIG = {
OLD_ORIGIN_ID: 'origin-production',
NEW_ORIGIN_ID: 'origin-canary',
// Basis points: 1 = 0.01%, 10 = 0.1%, 100 = 1%
CANARY_BPS: 10
};
function getHeaderValue(headers, name) {
var header = headers[name.toLowerCase()];
return header && header.value ? header.value : '';
}
// ES5-safe deterministic 32-bit hash for bucketing
function stableHash(value) {
var hash = 2166136261;
for (var i = 0; i < value.length; i++) {
hash ^= value.charCodeAt(i);
hash += (hash << 1) + (hash << 4) + (hash << 7) + (hash << 8) + (hash << 24);
}
return hash >>> 0;
}
function pickRoutingKey(request, viewer) {
var headers = request.headers;
// Prefer JA4, then JA3. Both are anonymous TLS-derived fingerprints.
var ja4 = getHeaderValue(headers, 'cloudfront-viewer-ja4-fingerprint');
if (ja4) return 'ja4:' + ja4;
var ja3 = getHeaderValue(headers, 'cloudfront-viewer-ja3-fingerprint');
if (ja3) return 'ja3:' + ja3;
// Final fallback: still deterministic, but less stable than JA4/JA3
var userAgent = getHeaderValue(headers, 'user-agent');
var ip = (viewer && viewer.ip) ? viewer.ip : 'unknown';
return 'uaip:' + userAgent + '|' + ip;
}
function applyManualOverride(request) {
var headers = request.headers;
var querystring = request.querystring || {};
// 1) QA header override
var qaVersion = getHeaderValue(headers, 'x-qa-version');
if (qaVersion === 'canary') return true;
if (qaVersion === 'stable') return false;
// 2) One-off query override
if (querystring.canary) {
var forceCanary = querystring.canary.value === '1';
// Remove the test flag so it does not accidentally affect cache behavior
delete request.querystring.canary;
return forceCanary;
}
return null;
}
function handler(event) {
var request = event.request;
var viewer = event.viewer || {};
var useNewOrigin = applyManualOverride(request);
if (useNewOrigin === null) {
var routingKey = pickRoutingKey(request, viewer);
var bucket = stableHash(routingKey) % 10000;
useNewOrigin = bucket < CONFIG.CANARY_BPS;
}
// Keep stable and canary cache entries isolated
request.headers['x-canary-origin'] = {
value: useNewOrigin ? 'new' : 'old'
};
if (useNewOrigin) {
cf.selectRequestOriginById(CONFIG.NEW_ORIGIN_ID);
} else {
cf.selectRequestOriginById(CONFIG.OLD_ORIGIN_ID);
}
return request;
}
What this code gets right
This version is designed to align with the operational goals in this article.
It does not use cookies
That keeps the mechanism lightweight and avoids introducing a separate persistence layer just to keep users on the same version.
It prefers the most stable anonymous signal available
JA4 first, then JA3, gives you a better routing key than raw IP alone. When those fingerprints are unavailable, the function still behaves predictably.
Header-name note: AWS documentation presents these as
CloudFront-Viewer-JA4-FingerprintandCloudFront-Viewer-JA3-Fingerprint, but inside a CloudFront Functions event object you access them in lowercase ascloudfront-viewer-ja4-fingerprintandcloudfront-viewer-ja3-fingerprint.
It keeps cache entries separated
The x-canary-origin request header is the critical separator. Include this header in the cache key so stable and canary content never share the same cached object.
It supports controlled previews
Executives, QA, or launch managers can view the canary safely using a header or a one-off query switch without changing the default rollout.
Required distribution configuration
The code is only part of the solution. The distribution settings matter just as much.
1. Two origins
Define one origin for the current production build and one for the canary build.
2. Viewer-request CloudFront Function
Associate the function with the cache behavior that serves the site.
3. Enable the CloudFront-generated fingerprint headers
CloudFront needs to add the viewer fingerprint headers so the function can inspect them:
CloudFront-Viewer-JA4-FingerprintCloudFront-Viewer-JA3-Fingerprint
These headers are available to CloudFront Functions and Lambda@Edge, but only for HTTPS requests. AWS also notes that these TLS-related headers can be added to an origin request policy, but not to a cache policy.
4. Cache policy includes x-canary-origin
This is what keeps stable and canary cache objects isolated.
5. Query-string handling is deliberate
If you allow the ?canary=1 switch for testing, either:
- remove it in the function, as shown above, or
- make sure it is not part of the cache key
6. SPA or app-style routing fallback is explicit
If the migration changes routing behavior, make sure refreshes and deep links return the correct document.
7. Validate viewer fingerprint availability in your environment
JA3 and JA4 are excellent routing inputs when exposed to the function path, but you should validate their availability in your exact CloudFront setup before making them your only signal.
A rollout plan that works for both engineers and executives
A practical rollout plan looks like this:
Stage 1: 0.1% smoke test
Purpose: prove routing, cache isolation, and basic stability.
Watch for:
- unexpected 404s
- JavaScript startup failures
- broken asset loading
- obvious support noise
Stage 2: 1% early validation
Purpose: confirm that the new build survives real traffic patterns.
Watch for:
- client-side error rates
- performance regressions
- login or session complaints
- routing edge cases
Stage 3: 5% confidence build
Purpose: compare business and operational metrics against the stable path.
Watch for:
- conversion rate
- bounce rate
- task completion
- error budgets
- support tickets
Stage 4: 25% operational proof
Purpose: verify team readiness, dashboards, alerts, and rollback confidence under broader load.
Stage 5: 100% cutover
Promote only when the migration risk is low enough for the new path to become the default.
What to measure during the canary
For technical teams:
- 404 rate by path
- JavaScript error rate
- asset load failures
- Core Web Vitals or equivalent performance signals
- origin error rate and CDN cache behavior
For business stakeholders:
- conversion or lead generation
- checkout or sign-in completion
- bounce rate
- support ticket volume
- incident count during the rollout window
Important caveats
JA3 and JA4 are not identity
They are useful routing signals, not person-level identifiers. Their role here is to keep similar requests on the same side of the rollout decision, not to identify an individual user.
HTTPS matters
JA3 and JA4 come from the TLS Client Hello, so they are relevant only for HTTPS traffic.
The cache key is the safety line
If you remember only one implementation detail, remember this one: do not mix stable and canary objects in the same cache key.
Keep rollback boring
The best rollout is the one that can be reversed in seconds without debate. Edge-based selection gives you that option.
Final takeaway
A canary deployment for static content is not just a safer release tactic.
For major web migrations, it is a way to separate technical uncertainty from business risk.
By making the decision at the edge, using deterministic cookie-free routing, and isolating cache entries correctly, you can move from a fragile big-bang launch to a controlled rollout that both engineers and executives can support.
That is the real value: fewer surprises, faster rollback, and better evidence before full cutover.
Acknowledgments
Special thanks to @Moti Moskovich for his contribution to this article.



Top comments (0)