우병수

Posted on Jun 3 • Originally published at techdigestor.com

CloudFront + Vercel + Lambda@Edge: A Debugging Journal from Someone Who's Been Paged at 2am

#programming #tutorial #aws #webdev

TL;DR: The first time I wired CloudFront in front of Vercel, I thought it'd take an afternoon. It took three days, two support tickets, and one very humbling Lambda@Edge timeout at 2am.

📖 Reading time: ~37 min

What's in this article

The Setup That Looks Simple Until It Isn't
The Basic Wiring — Get This Right First
The Host Header Problem (This Will Bite You)
Lambda@Edge: The Constraints Nobody Warns You About
Debugging Lambda@Edge Logs (They're Not Where You Expect)
The Replication Delay That Makes You Think Your Deploy Didn't Work
Common Error Messages and What They Actually Mean
Vercel-Specific Gotchas When Sitting Behind a Proxy

The Setup That Looks Simple Until It Isn't

The first time I wired CloudFront in front of Vercel, I thought it'd take an afternoon. It took three days, two support tickets, and one very humbling Lambda@Edge timeout at 2am. The architecture looks trivial on a whiteboard: CloudFront distribution → origin pointing at your Vercel deployment URL → Lambda@Edge functions handling rewrites, auth headers, or request manipulation → response lands back at your user. That's it. The diagram fits on a napkin. The edge cases do not.

So why bother? A few real scenarios where this actually earns its complexity cost: your company's security team mandates AWS WAF rules on all external traffic and Vercel's firewall isn't on the approved vendor list; you need to route /api/* to one Vercel project and /app/* to another without burning through Vercel's rewrite limits; you want CloudFront's aggressive cache policies for static assets with TTLs and invalidation logic that Vercel's edge cache doesn't give you fine-grained control over; or you're multi-tenant and need to inject per-tenant headers before requests ever reach your Next.js app. These aren't contrived edge cases — they come up constantly in larger orgs or any setup that predates your Vercel migration.

The architecture in one sentence: CloudFront receives the request, a Lambda@Edge function runs at the origin-request or viewer-request stage to rewrite URLs, strip or inject headers, or enforce auth, and the modified request hits your Vercel deployment URL as the origin. Simple sentence, brutal implementation. The thing that catches everyone is that Lambda@Edge is not regular Lambda — 1MB response payload limit, 128MB–10GB memory but capped execution time of 5 seconds for origin-request, no environment variables (you use SSM or hardcode), and cold starts happen at AWS edge nodes you have zero visibility into. Your normal Lambda debugging muscle memory doesn't transfer cleanly.

This guide is specifically about the failures. Not the happy path where everything resolves cleanly and your CloudFront distribution serves Vercel content in 200ms. I'm talking about: cryptic 502 ERROR The request could not be satisfied with no body, Lambda@Edge logs that appear in different regions than where you deployed the function, Vercel rejecting requests because CloudFront strips the Host header, cache keys that accidentally contain auth tokens and bleed responses between users, and the SNI mismatch that only shows up in production because your staging domain is on a different certificate. If you're using AI tools to help debug this stack, here's our guide on Best AI Coding Tools in 2026 — Copilot and Cursor can actually help parse Lambda@Edge logs faster than you'd think, especially when you're grepping CloudWatch across six regions at midnight.

One thing to establish early: Vercel's deployment URLs follow the pattern your-project-git-branch-orgname.vercel.app and they enforce their own Host validation. When CloudFront proxies a request, it rewrites the Host header to your CloudFront domain unless you explicitly override it — and if Lambda@Edge doesn't set the correct Host before the request hits Vercel's origin, you'll get rejected with a 404 or a redirect loop that looks completely unrelated to headers. That single issue accounts for probably half the "why isn't this working" questions I've seen on this setup.

The Basic Wiring — Get This Right First

The most counterintuitive thing about this setup is the origin URL. Your instinct is to point CloudFront at your production domain — yourapp.com — but that creates a routing loop if CloudFront is your production domain. Point it at the Vercel deployment URL directly: something like your-project-abc123.vercel.app or the stable your-project.vercel.app alias. Not your custom domain. The deployment URL bypasses Vercel's edge network routing and talks straight to your project.

Origin Configuration That Won't Bite You Later

In your CloudFront distribution's origin settings, protocol policy must be HTTPS only. Vercel drops HTTP requests or redirects them, so if you set "HTTP and HTTPS" or "HTTP only," you'll spend 20 minutes debugging 301 redirect loops in curl before realizing the cause. Set the origin port to 443. Then — and this is the part that causes most 404s — you need to manually override the Host header to match your Vercel project.

By default, CloudFront forwards the Host header from the viewer request, which is your CloudFront domain (d1abc123.cloudfront.net). Vercel uses the Host header to figure out which project to serve. When it sees a random CloudFront domain it doesn't recognize, it returns a 404 — not a useful error, just silence. Fix this in the origin's "Custom headers" section:

# In CloudFront origin settings → Custom headers
Header name:  Host
Value:        your-project.vercel.app

If you're using Terraform or CloudFormation, this goes in the custom_headers block of your origin config. Get this wrong and you'll be debugging what looks like a routing problem but is actually just Vercel not knowing what project you're asking for.

Lambda@Edge Event Type: Viewer-Request Almost Always Wins

There are four event types: viewer-request, origin-request, origin-response, viewer-response. For auth — JWT validation, cookie checking, redirect-to-login logic — attach your function to viewer-request. Here's why origin-request is the wrong choice for auth: CloudFront's cache can serve a response without ever firing an origin-request trigger. A logged-out user hits a cached page, CloudFront serves it from cache, your auth Lambda never runs, and they see content they shouldn't. Viewer-request fires on every single request, cached or not.

Origin-request is genuinely useful when you need to rewrite the URL before it hits Vercel, or add a header that Vercel needs to see. But don't put authentication there. The mental model is: viewer-request = security gate, origin-request = request transformation before the backend sees it.

// CloudFormation snippet for function association
"LambdaFunctionAssociations": [
  {
    "EventType": "viewer-request",  // not origin-request
    "LambdaFunctionARN": "arn:aws:lambda:us-east-1:123456789:function:my-auth:5",
    "IncludeBody": false
  }
]

One hard constraint: Lambda@Edge functions must be deployed in us-east-1, full stop. It doesn't matter where your users are or where the rest of your stack lives. If you try to associate a function from another region, the API will reject it with a confusing error message. Deploy to us-east-1 and replicate from there.

The IAM Role That Actually Works

Lambda@Edge needs a trust policy that includes both lambda.amazonaws.com and edgelambda.amazonaws.com. Most examples online only show one. If you only have lambda.amazonaws.com, you can deploy the function but you can't associate it with a CloudFront behavior — the console gives you a permissions error that doesn't clearly say "fix your trust policy."

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": [
          "lambda.amazonaws.com",
          "edgelambda.amazonaws.com"
        ]
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

For the permissions policy attached to that role, the minimum you need for a viewer-request auth function that only reads the request and either allows or redirects:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      // Lambda@Edge writes logs to the region where the edge node runs,
      // not us-east-1 — so you need wildcard region here or you'll have no logs
      "Resource": "arn:aws:logs:*:*:*"
    }
  ]
}

That wildcard on the logs resource isn't laziness — Lambda@Edge executes in whichever AWS region is closest to the user, and the CloudWatch logs land in that region. If you scope the logs ARN to us-east-1, your function will work but you'll have zero visibility into what it's doing in production. I've seen teams spend hours on this wondering why their logs were empty.

The Host Header Problem (This Will Bite You)

The error that wastes the most time in this setup isn't a permissions issue or a misconfigured Lambda — it's a single missing header that causes Vercel to return {"error":{"code":"DEPLOYMENT_NOT_FOUND","message":"No deployment found for URL your-custom-domain.com"}} with a 404. You stare at it for an hour because the origin domain in CloudFront looks completely correct. The URL resolves. The deployment exists. Everything seems fine.

Here's what's actually happening: CloudFront, by default, forwards the Host header from the original client request to your origin. So when someone hits www.yourdomain.com, Vercel receives Host: www.yourdomain.com. Vercel's routing layer uses the Host header — not the connection IP, not the path — to look up which project to serve. It has no idea what www.yourdomain.com maps to unless you've explicitly added it as a custom domain in your Vercel project settings. If you haven't, or if you're proxying through CloudFront specifically to avoid doing that, you get the deployment not found error. The fix is to tell CloudFront to override the Host header to your-project.vercel.app before forwarding the request upstream.

You might think you can handle this in a Lambda@Edge viewer-request function by modifying event.Records[0].cf.request.headers.host. You can't — not effectively. The Host header in a viewer-request event is what CloudFront uses internally for routing decisions, and modifications there don't reliably propagate to the origin request. The right hook is origin-request, where you actually can rewrite headers before they leave CloudFront's infrastructure. But honestly, for just fixing the Host header, you don't need Lambda@Edge at all — CloudFront has a native "Add Custom Header" override at the origin level that's simpler, cheaper (no Lambda invocation cost), and easier to audit.

In the CloudFront console, go to your distribution → Origins tab → click your Vercel origin → Edit. Scroll down to the Add custom header section (it's below the origin path and connection settings, above the SSL settings). Add a header with name Host and value your-project.vercel.app. Save and deploy. That's it. CloudFront will override whatever Host header the client sent with this value on every request to the origin. In CloudFormation, this lives under Origins[].CustomHeaders.Items:

Origins:
  - Id: vercel-origin
    DomainName: your-project.vercel.app
    CustomOriginConfig:
      HTTPSPort: 443
      OriginProtocolPolicy: https-only
      OriginSSLProtocols:
        - TLSv1.2
    # This is the critical part — without it, Vercel gets Host: www.yourdomain.com
    CustomHeaders:
      Quantity: 1
      Items:
        - HeaderName: Host
          HeaderValue: your-project.vercel.app

One gotcha: if you also have a Lambda@Edge origin-request function attached, and that function touches the host header, it will overwrite what CloudFront set from the custom header config. I've seen this cause confusion when someone inherits a stack where a previous dev added Lambda@Edge to handle redirects — the custom header gets set, then the Lambda clobbers it before the request leaves. Check your origin-request function first if the fix above doesn't immediately work. The order is: CloudFront applies custom headers → origin-request Lambda fires → request goes to Vercel. Lambda always gets the last word.

Lambda@Edge: The Constraints Nobody Warns You About

The thing that caught me completely off guard on my first Lambda@Edge deployment was opening the function configuration in the AWS console and finding no environment variables tab. Not a disabled tab. Not a greyed-out field. Just... gone. Lambda@Edge functions have zero support for environment variables at the platform level, and AWS buries this in the docs in a way that makes it easy to miss until you've already written your function assuming process.env.API_KEY will work. It won't. Your two real options: fetch secrets from SSM Parameter Store at cold start and cache them in the module scope, or bake them into the deployment package itself — which means a new deployment every time a secret rotates.

The SSM approach is what I use in production. Here's the actual pattern:

// Runs once per cold start, cached in module scope
// SSM call happens from the nearest region automatically
import { SSMClient, GetParameterCommand } from "@aws-sdk/client-ssm";

const ssm = new SSMClient({ region: "us-east-1" });
let cachedSecret: string | null = null;

async function getSecret(): Promise {
  if (cachedSecret) return cachedSecret;
  const result = await ssm.send(new GetParameterCommand({
    Name: "/myapp/cloudfront/api-key",
    WithDecryption: true,
  }));
  cachedSecret = result.Parameter!.Value!;
  return cachedSecret;
}

export const handler = async (event: CloudFrontRequestEvent) => {
  const secret = await getSecret();
  // ... rest of your logic
};

The catch: that SSM call happens during cold start, and cold start time counts against your execution budget. Viewer-facing events (viewer-request, viewer-response) get 5 seconds total. Origin-facing events (origin-request, origin-response) get 30 seconds. A cold start with an SSM fetch on a viewer-request function will regularly eat 1-2 seconds before your actual logic runs. I moved all the heavy lifting to origin-request specifically because of this — you get 30 seconds and the cold start penalty hurts much less.

Package size is the other thing that will humiliate you. The limits are 1MB compressed for viewer-facing events and 50MB for origin-facing events. That sounds fine until you add aws-sdk (v2 ships at ~8MB uncompressed), jsonwebtoken, and one JWT verification library, and suddenly you're over the viewer limit before your function has a single line of business logic. I use esbuild to bundle and tree-shake everything for viewer functions. The AWS SDK v3 is modular — only import the clients you actually need, not the whole SDK. For origin functions I'm less aggressive, but I still bundle and check the zip size explicitly in CI:

# Fails the build if the zip exceeds 45MB (buffer before the 50MB hard limit)
zip -r function.zip . -x "*.test.*"
SIZE=$(du -k function.zip | cut -f1)
if [ "$SIZE" -gt 46080 ]; then
  echo "Lambda@Edge package too large: ${SIZE}KB"
  exit 1
fi

The us-east-1 requirement deserves its own moment of frustration. Lambda@Edge functions must be deployed to us-east-1, full stop. If your entire infrastructure lives in us-west-2 or eu-west-1, you still need a separate Terraform workspace or CloudFormation stack pointed at us-east-1 just for these functions. AWS then replicates them to edge locations automatically, but the source of truth has to be N. Virginia. I've seen teams burn an afternoon debugging why their Terraform apply keeps failing — it's because the aws_lambda_function resource is using a provider aliased to the wrong region. Your CDK or Terraform config needs something like:

# Terraform: explicit us-east-1 provider for Lambda@Edge
provider "aws" {
  alias  = "us_east_1"
  region = "us-east-1"
}

resource "aws_lambda_function" "edge_auth" {
  provider      = aws.us_east_1
  function_name = "cloudfront-auth-edge"
  # publish = true is REQUIRED for Lambda@Edge association
  publish       = true
  ...
}

The VPC restriction is the one that forces real architecture changes. Lambda@Edge runs on CloudFront's edge infrastructure, which means it has no access to your VPC — no private subnets, no security groups, nothing. If your auth service, token introspection endpoint, or user database is VPC-internal only, you cannot call it from a Lambda@Edge function. Your options are: put a public-facing ALB or API Gateway in front of your auth service (and lock it down with SigV4 or an API key header, since it's now internet-reachable), move the verification logic into the Lambda itself using a shared secret or a public key for JWT verification, or drop down to a CloudFront Function for simple request manipulation that doesn't need a network call at all. I ended up using asymmetric JWTs — the Lambda@Edge function only needs the public key to verify tokens, which I bake into the package at deploy time, so no VPC calls needed.

Debugging Lambda@Edge Logs (They're Not Where You Expect)

The first time I deployed a Lambda@Edge function and it silently failed, I spent 45 minutes staring at CloudWatch in us-east-1 wondering why there were zero logs. The function existed there, I deployed it there, so obviously that's where the logs would be — except no. Lambda@Edge writes logs to the CloudWatch region closest to the edge location that handled the request. If you're testing from London, your logs are in eu-west-1. Testing from Tokyo? ap-northeast-1. This isn't documented prominently enough and it trips up every developer the first time.

The fastest way to find which region actually received your request is to check CloudFront's access logs or response headers first, then query that specific region. But if you want to brute-force it, use the AWS CLI to scan candidate regions. The log group name follows a specific pattern — it includes the deployment region (us-east-1) even though the logs are written elsewhere:

# Check if your function logged anything in eu-west-1
aws cloudwatch describe-log-groups \
  --region eu-west-1 \
  --log-group-name-prefix "/aws/lambda/us-east-1.your-function-name"

# If that returns a log group, pull recent streams
aws logs describe-log-streams \
  --region eu-west-1 \
  --log-group-name "/aws/lambda/us-east-1.your-function-name" \
  --order-by LastEventTime \
  --descending \
  --max-items 5

For a request that could have hit any edge location, CloudWatch Log Insights lets you run a query across multiple regions simultaneously — but you have to add each log group manually in the console, or script it. Here's the Insights query I use to track down a specific request using the CloudFront request ID:

fields @timestamp, @message
| filter @message like /YOUR-CF-REQUEST-ID/
| sort @timestamp desc
| limit 20

You get the CloudFront request ID from the x-amz-cf-id response header. Add that header to your curl command or check it in browser DevTools under the response headers for any CloudFront-served request. Once you have that ID, it shows up in your Lambda@Edge logs if you're logging the event — which brings me to the thing I now do on every first deploy without exception:

exports.handler = async (event, context) => {
  // Log the full event on first deploy — remove before production
  // You CANNOT know the exact shape of cf.request until you see it live
  console.log(JSON.stringify(event, null, 2));

  const request = event.Records[0].cf.request;
  // ... rest of your handler
};

The reason this matters: the event shape for a CloudFront origin-request trigger is genuinely surprising. Headers are not a flat object — they're { "host": [{ "key": "Host", "value": "example.com" }] }. The URI is already decoded. The query string is a raw string, not parsed. Origin config lives nested inside the event and is mutable, which is how you rewrite the origin dynamically. If you skip this step and try to write the proxy logic from the docs alone, you'll spend hours chasing undefined errors that a single JSON.stringify(event) log entry would have solved in two minutes. Nuke that log line before you hit production though — Lambda@Edge has a 1MB response size limit and verbose event logs in high-traffic functions will balloon your CloudWatch costs fast.

The Replication Delay That Makes You Think Your Deploy Didn't Work

The first time I deployed a Lambda@Edge fix and refreshed the browser expecting it to work, nothing changed. I deployed again. Still nothing. I checked the Lambda console — the function updated fine. I checked CloudFront — the behavior was pointing at the right ARN. Twenty minutes later, suddenly it worked. That's not a bug in your deployment pipeline. That's just how Lambda@Edge propagates.

AWS documentation says propagation takes "up to several minutes." My experience across multiple projects: budget 15-20 minutes minimum, and on bad days I've sat on a broken state for close to 30. The replication is asynchronous across every edge location (PoP) globally, and there's no signal in the console that tells you when it's done. The function version can be fully deployed and the ARN updated in your CloudFront distribution, while 40% of the PoPs are still running your old code. If your test requests happen to hit one of those stale PoPs — and they will — you'll spend the next hour convinced your fix is wrong.

The diagnostic I always run first now is checking the x-amz-cf-pop response header. That header tells you which PoP handled the request — something like IAD89-P1 or LHR3-C2. Cross-reference that with your CloudWatch Logs Insights query scoped to /aws/cloudfront/LambdaEdge/ and filter by the function ARN and timestamp:

# CloudWatch Logs Insights — paste into the query editor
# Select log group: /aws/cloudfront/LambdaEdge/YOUR_DISTRIBUTION_ID

fields @timestamp, @message
| filter @message like /viewer-request/
| filter @message like /IAD89-P1/   # match the PoP from x-amz-cf-pop
| sort @timestamp desc
| limit 20

If the logs show your old function version executing, you're confirmed stale. If there are no logs at all for that PoP yet, the function hasn't replicated there. That distinction matters — "no logs" means wait longer, "old version in logs" might mean something went wrong with the deployment association.

During development I started pinning requests to a specific edge using curl's --resolve flag combined with a known PoP IP. You can find PoP IPs by resolving your CloudFront domain from different regions (use a tool like dig from a VPS in that region, or use the dnschecker.org global lookup). Once you have an IP that corresponds to a PoP you know has replicated your latest version based on CloudWatch evidence:

# Force all requests to a specific PoP IP without DNS resolution
# Replace 13.226.x.x with the actual PoP IP you've verified
curl -v \
  --resolve "your-distribution.cloudfront.net:443:13.226.x.x" \
  -H "X-Forwarded-Host: your-vercel-app.vercel.app" \
  "https://your-distribution.cloudfront.net/api/test-route"

# Confirm you're hitting the right PoP:
# Look for "x-amz-cf-pop" in the response headers

This isn't a permanent fix — you can't control which PoP real users hit — but for debugging during a deploy window it lets you confirm your logic is actually correct before you start second-guessing the code itself.

The hard lesson: never push a Lambda@Edge change inside a 30-minute window before anything time-sensitive. Demo, deadline, customer call — it doesn't matter. I've seen engineers deploy a "quick fix" ten minutes before a demo and then spend the entire demo explaining why the site is broken for "some users" while the propagation catches up. If you need a Lambda@Edge change to be live at a specific time, deploy it 45 minutes early, monitor the CloudWatch logs across at least 3-4 different PoP identifiers in the x-amz-cf-pop header, and only declare it done when you see consistent behavior across regions. There's no shortcut here — invalidating the CloudFront cache does not speed up Lambda function replication. Those are two completely separate systems.

Common Error Messages and What They Actually Mean

The error that burned me the worst on my first Lambda@Edge + Vercel setup was "The Lambda function result failed validation" — and it's infuriating because CloudFront gives you zero context about what failed. The issue is almost always a malformed response object shape. CloudFront is strict: for a viewer-request function, the response you return must look exactly like this:

// This is the ONLY shape CloudFront accepts for viewer-request responses
return {
  status: '200',           // string, not integer — yes, really
  statusDescription: 'OK',
  headers: {
    'content-type': [{
      key: 'Content-Type', // capitalization matters in the key field
      value: 'text/html'
    }]
  },
  body: '<html>...</html>'
};

Notice status is a string, not a number. The headers format is an array of objects with key and value, not a flat key-value map. Every single time I've seen this error in production, someone returned status: 200 or passed headers as { 'content-type': 'text/html' }. CloudFront will reject both silently with that useless validation error. If you're forwarding the request through rather than short-circuiting it, return the request object unmodified — don't reconstruct it from scratch.

A 502 Bad Gateway after wiring up Lambda@Edge is almost always an uncaught exception in your function. Lambda@Edge doesn't have the same timeout and error behavior as a regular Lambda — if your function throws, CloudFront gets nothing back and returns a 502 to the client. The fix is mechanical but non-negotiable: wrap everything in a try/catch and return a valid response from the catch block. Don't re-throw.

exports.handler = async (event, context, callback) => {
  try {
    const request = event.Records[0].cf.request;
    // your actual logic here
    callback(null, request);
  } catch (err) {
    // Log to CloudWatch — but ALWAYS return a valid response
    console.error('Lambda@Edge error:', err);
    callback(null, {
      status: '500',
      statusDescription: 'Internal Server Error',
      headers: {},
      body: 'Internal Server Error'
    });
  }
};

The Vercel 308 redirect loop is a different beast. Vercel redirects HTTP to HTTPS and also redirects bare domain traffic to its canonical hostname — which, if you haven't configured your custom domain correctly in the Vercel dashboard, defaults to *.vercel.app. When CloudFront follows that redirect, your users end up on the Vercel domain rather than yours. Fix: add your custom domain in Vercel under Project Settings → Domains, and make sure "Redirect to" is set to your actual domain, not the Vercel subdomain. Also check that the domain you're using as CloudFront's origin matches exactly what Vercel has configured — protocol, subdomain, everything. Vercel uses SNI to route traffic, so a mismatch means it falls back to a default redirect you don't want.

InvalidLambdaFunctionAssociation during CloudFormation deploys is a one-liner fix once you know it: you cannot associate $LATEST with a CloudFront distribution. The function ARN must include a version number. In CloudFormation or CDK, this means you need to publish a version explicitly:

# CloudFormation snippet — note the !Ref vs !GetAtt difference
ViewerRequestFunction:
  Type: AWS::Lambda::Function
  Properties:
    FunctionName: my-edge-fn
    Runtime: nodejs20.x
    # ...

ViewerRequestVersion:
  Type: AWS::Lambda::Version
  Properties:
    FunctionName: !GetAtt ViewerRequestFunction.Arn

# Then reference the version ARN in your CloudFront distribution:
# !Ref ViewerRequestVersion  ← this gives you the versioned ARN

The CORS-only-through-CloudFront issue is subtle and shows up after everything else looks fine. By default, CloudFront's cache behavior doesn't forward the Origin header to your Vercel origin — which means Vercel never sees a cross-origin request, never sends back Access-Control-Allow-Origin, and CloudFront hands the browser a response with no CORS headers. Two things need to happen: first, add Origin to your cache behavior's Allowed Headers list (under the origin request policy). Second, if you're caching responses, you must also add Origin to the Cache Key — otherwise CloudFront serves the same cached response (no CORS headers) to everyone regardless of their origin. Miss the second part and you'll see CORS errors only for the first cached request from a cross-origin client, which is the most confusing kind of intermittent bug to chase down.

Vercel-Specific Gotchas When Sitting Behind a Proxy

The thing that caught me off guard first was Vercel silently dropping requests. No 4xx, no logs — just timeouts. Turns out Vercel's DDoS protection was flagging Lambda@Edge egress IPs because they originate from AWS IP ranges that look like datacenter traffic (because they are). If you're on Vercel Pro or Enterprise, the fix is the Trusted IPs feature under your project's Security settings. Add the Lambda@Edge execution region CIDR blocks — you can pull the current AWS IP ranges from https://ip-ranges.amazonaws.com/ip-ranges.json, filter for "service": "LAMBDA" and your region. On free/hobby plans there's no whitelist option, so you'll hit this wall hard and have no clean solution.

The x-forwarded-for problem is subtle and will corrupt your analytics if you ignore it. When Lambda@Edge forwards a request to Vercel, Vercel sees the Lambda execution IP as the client. Your Vercel Analytics dashboard fills up with AWS datacenter IPs, and any geo-targeting logic you have server-side breaks completely. Fix it in your Lambda@Edge origin-request handler:

// origin-request Lambda@Edge
exports.handler = async (event) => {
  const request = event.Records[0].cf.request;
  const realIp = event.Records[0].cf.request.clientIp;

  // Append real client IP to the chain instead of replacing it
  const existingXFF = request.headers['x-forwarded-for']?.[0]?.value;
  request.headers['x-forwarded-for'] = [{
    key: 'X-Forwarded-For',
    value: existingXFF ? `${existingXFF}, ${realIp}` : realIp
  }];

  // Vercel reads x-real-ip for single-IP use cases
  request.headers['x-real-ip'] = [{ key: 'X-Real-Ip', value: realIp }];

  return request;
};

Without this, Vercel's request.ip in Edge Functions and API routes returns a Lambda IP. The append pattern (not replace) matters because if CloudFront itself has already written an XFF header, nuking it breaks any downstream audit trail.

Caching conflicts between CloudFront and Vercel's edge network are the most common source of stale-content bugs in this setup. Vercel caches aggressively by default — static assets get long TTLs, and even some API routes get edge-cached if your response headers don't explicitly opt out. If CloudFront is your cache layer (which it should be if you're doing this architecture), you need Vercel to be a dumb origin. Set this response header in your Vercel config or in code:

// vercel.json — force Vercel's edge to never cache; CloudFront handles TTLs
{
  "headers": [
    {
      "source": "/(.*)",
      "headers": [
        { "key": "Cache-Control", "value": "no-store" }
      ]
    }
  ]
}

Then manage your actual TTLs entirely in CloudFront cache behaviors — set different TTLs per path pattern (/api/* → 0s, /_next/static/* → 31536000s, etc.). The gotcha here is that no-store also disables revalidation, which is fine since CloudFront's own TTL+invalidation workflow replaces that. Don't use no-cache instead — Vercel's edge will still store and revalidate on that directive, defeating the purpose.

Preview deployments broke my CloudFront setup for two weeks before I figured out the routing pattern. Vercel generates URLs like my-app-git-feature-branch-myteam.vercel.app and you probably want to be able to route preview.yourdomain.com or branch-specific subdomains through CloudFront for testing with real Lambda@Edge behavior. The right approach is origin selection inside your Lambda@Edge viewer-request or origin-request function, keyed off a custom header you set in CloudFront:

// In CloudFront, add a custom origin request header: X-Branch-Target
// Then in Lambda@Edge origin-request:
exports.handler = async (event) => {
  const request = event.Records[0].cf.request;
  const branch = request.headers['x-branch-target']?.[0]?.value;

  if (branch && branch !== 'main') {
    // Route to the branch-specific Vercel preview URL
    const previewHost = `my-app-git-${branch}-myteam.vercel.app`;
    request.origin = {
      custom: {
        domainName: previewHost,
        port: 443,
        protocol: 'https',
        sslProtocols: ['TLSv1.2'],
        readTimeout: 30,
        keepaliveTimeout: 5,
        customHeaders: {}
      }
    };
    request.headers['host'] = [{ key: 'Host', value: previewHost }];
  }

  return request;
};

The critical detail: you must update the host header to match the new origin domain, or Vercel returns a 404 because its routing is host-header based. Also, Lambda@Edge functions are regional but CloudFront distributions are global — your origin selection logic runs in the edge location closest to the user, so latency to Vercel's preview infra varies. For the actual X-Branch-Target header value, inject it from your CI pipeline using CloudFront's origin custom headers or via a signed cookie pattern if you need per-user branch routing.

A Working Lambda@Edge Auth Pattern (Viewer-Request)

The Pattern That Actually Works: JWT Verification Before Your Origin Ever Sees the Request

Most tutorials show you Lambda@Edge auth at the origin-request stage, which means your origin still gets the cold start latency hit on every cache miss. Move JWT verification to viewer-request and you reject bad tokens at the edge before CloudFront even considers routing to Vercel. The trade-off is that viewer-request functions have a 1MB deployment package limit and a 128MB memory cap — you cannot bring in a full Node crypto library. Use the jose package (about 40KB minified) instead of jsonwebtoken which pulls in half of npm.

// viewer-request/index.mjs
// Cold start: fetch JWKS once, cache on module scope
// This runs OUTSIDE the handler — executes once per container lifecycle

import { GetParameterCommand, SSMClient } from "@aws-sdk/client-ssm";
import { createRemoteJWKSet, jwtVerify } from "jose";

const ssm = new SSMClient({ region: "us-east-1" }); // must be us-east-1 for Lambda@Edge

let JWKS; // module-scoped cache — survives warm invocations

async function getJWKS() {
  if (JWKS) return JWKS; // skip SSM call on warm containers

  const cmd = new GetParameterCommand({
    Name: "/myapp/prod/jwks-uri",
    WithDecryption: false, // JWKS URI is not secret, but the endpoint it points to is auth
  });

  const { Parameter } = await ssm.send(cmd);
  const jwksUri = new URL(Parameter.Value);

  // createRemoteJWKSet fetches and caches the key set internally
  // it also handles key rotation automatically via kid matching
  JWKS = createRemoteJWKSet(jwksUri);
  return JWKS;
}

export const handler = async (event) => {
  const request = event.Records[0].cf.request;
  const authHeader = request.headers["authorization"]?.[0]?.value ?? "";

  // Skip auth for public health-check paths — adjust to your needs
  if (request.uri === "/api/health") return request;

  if (!authHeader.startsWith("Bearer ")) {
    return {
      status: "401",
      statusDescription: "Unauthorized",
      headers: {
        "www-authenticate": [{ key: "WWW-Authenticate", value: "Bearer" }],
        "content-type": [{ key: "Content-Type", value: "application/json" }],
      },
      body: JSON.stringify({ error: "missing_token" }),
    };
  }

  const token = authHeader.slice(7);

  try {
    const jwks = await getJWKS();

    await jwtVerify(token, jwks, {
      audience: process.env.JWT_AUDIENCE,   // e.g. "https://api.myapp.com"
      issuer: process.env.JWT_ISSUER,       // e.g. "https://myapp.us.auth0.com/"
    });

    // Token is valid — forward the request to Vercel unchanged
    // DO NOT strip the Authorization header here; your origin might need it
    return request;

  } catch (err) {
    // JWTExpired, JWSSignatureVerificationFailed, etc. all land here
    return {
      status: "401",
      statusDescription: "Unauthorized",
      headers: {
        "content-type": [{ key: "Content-Type", value: "application/json" }],
      },
      body: JSON.stringify({ error: "invalid_token", detail: err.code }),
    };
  }
};

The Stale Key Problem Nobody Mentions Until They Get Paged

Storing the JWKS URI in SSM and caching JWKS at module scope is the right call for cold start latency — an extra 80–120ms SSM round trip on every warm invocation adds up fast across hundreds of edge locations. But the risk is real: if your auth provider rotates signing keys (Auth0 does this on a 90-day schedule by default, Cognito does it silently), containers holding stale JWKS objects will start rejecting valid tokens. The jose library's createRemoteJWKSet mitigates this partially — it re-fetches keys when it encounters an unknown kid claim. That covers rotation, but not key removal. My recommendation: add a TTL check on the module-level cache and force a re-fetch every 6 hours. Lambda@Edge containers don't live forever, but they can persist surprisingly long under steady traffic.

let JWKS;
let jwksFetchedAt = 0;
const JWKS_TTL_MS = 6 * 60 * 60 * 1000; // 6 hours

async function getJWKS() {
  const now = Date.now();
  if (JWKS && now - jwksFetchedAt < JWKS_TTL_MS) return JWKS;

  // re-fetch from SSM and reinitialize
  const cmd = new GetParameterCommand({ Name: "/myapp/prod/jwks-uri" });
  const { Parameter } = await ssm.send(cmd);
  JWKS = createRemoteJWKSet(new URL(Parameter.Value));
  jwksFetchedAt = now;
  return JWKS;
}

The IAM Permission That Bites You at Deploy Time

Lambda@Edge functions assume an IAM role during execution — and that role needs to be assumable by both lambda.amazonaws.com and edgelambda.amazonaws.com. Missing the second one gives you a cryptic "The function execution role must be assumable by the edgelambda.amazonaws.com service principal" error when you associate the trigger. The SSM permission itself is straightforward:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "ssm:GetParameter",
      "Resource": "arn:aws:ssm:us-east-1:123456789012:parameter/myapp/prod/*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      // Lambda@Edge writes logs to the region where the request was served
      // NOT us-east-1 — so wildcard the region here or you'll have missing logs
      "Resource": "arn:aws:logs:*:123456789012:log-group:/aws/lambda/*"
    }
  ]
}

The trust policy for the role needs both principals. Skip either one and CloudFront will refuse the function association silently on some SDK versions — it just won't apply:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "Service": [
        "lambda.amazonaws.com",
        "edgelambda.amazonaws.com"
      ]
    },
    "Action": "sts:AssumeRole"
  }]
}

Test the Event Shape Locally Before You Touch a Deployment

The replication wait after updating a Lambda@Edge function can be 30–90 seconds per deploy, and there's no shortcut. The fastest feedback loop is mocking CloudFront viewer-request events locally using the same shape CloudFront actually sends. The @aws-sdk/client-cloudfront package doesn't include event mocks, but AWS publishes the exact event structure in their docs. I keep a __fixtures__/viewer-request.json file and run the handler directly with node --input-type=module:

// test/local-invoke.mjs
import { handler } from "../viewer-request/index.mjs";

const mockEvent = {
  Records: [{
    cf: {
      request: {
        method: "GET",
        uri: "/api/protected-resource",
        headers: {
          // CloudFront sends headers as arrays of { key, value } objects
          "authorization": [{
            key: "Authorization",
            value: "Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6InRlc3QifQ..." // a real test JWT
          }],
          "host": [{ key: "Host", value: "api.myapp.com" }]
        },
        querystring: "",
        body: null
      }
    }
  }]
};

const result = await handler(mockEvent);
console.log("Status:", result.status ?? "forwarded");
console.log("Response:", JSON.stringify(result, null, 2));

Run it with node test/local-invoke.mjs and you'll catch signature verification mismatches, wrong audience strings, and header shape bugs before you ever wait on CloudFront replication. One gotcha: if you're using environment variables for JWT_AUDIENCE and JWT_ISSUER, Lambda@Edge doesn't support environment variables natively (the Lambda console UI shows the field grayed out when you associate with CloudFront). You have to either bake values in at build time, read from SSM at cold start, or use a constants file that gets bundled in — which is why I put both the JWKS URI and the audience/issuer strings into SSM parameters.

When This Architecture Is the Wrong Call

The setup I've been describing is genuinely useful, but I've watched teams adopt it when a far simpler option would have done the job. Before you sink time into this, check whether you actually need it.

You just want a WAF

If the goal is blocking bad traffic or applying rate limiting, you don't need Lambda@Edge at all. AWS WAF attaches directly to a CloudFront distribution — no function deployment, no cold start budget, no multi-region headache. You create a Web ACL, attach it to your distribution, and you're done. Deploy time goes from "wait for the function to replicate across 400+ edge nodes" to about 60 seconds. I've seen engineers reach for Lambda@Edge here purely because the WAF docs are buried under 15 layers of AWS console navigation. Don't let discoverability drive architecture decisions.

CloudFront Functions handle 80% of header work for 1/6th the cost

Lambda@Edge costs $0.60 per million requests at viewer-request. CloudFront Functions cost $0.10 per million. If you're doing header rewrites, adding security headers, or simple URL redirects, CloudFront Functions run in sub-millisecond time with zero cold starts and deploy in under 30 seconds. The tradeoff is real though: you get 10KB of code max, no network access, no file system, and only viewer-request/viewer-response triggers. I use this mental model — if the logic fits in a switch\ statement and doesn't need to call anything external, it's a CloudFront Function. If it needs to fetch a secret, talk to DynamoDB, or do JWT validation, it's Lambda@Edge.

// CloudFront Function — security headers, fits fine here
function handler(event) {
  var response = event.response;
  var headers = response.headers;

  // Enforce HTTPS and prevent clickjacking — no Lambda needed for this
  headers['strict-transport-security'] = { value: 'max-age=63072000; includeSubdomains; preload' };
  headers['x-frame-options'] = { value: 'DENY' };
  headers['x-content-type-options'] = { value: 'nosniff' };

  return response;
}

Verify your Vercel plan before committing

Several of the workarounds in this setup — custom trusted IPs, bypassing DDoS protection for specific CIDR ranges, advanced routing rules — require Vercel Pro ($20/month per member) or Enterprise. The free hobby tier will silently break things in ways that are genuinely confusing to debug: you'll see 403s from Vercel's edge that look exactly like misconfigured Lambda@Edge headers, and you'll spend hours in the wrong place. Before you build this pipeline, log into your Vercel dashboard and confirm the features under "Security" and "Advanced" are actually available to you. The Vercel plans page lists what's gated, but it changes; check it fresh rather than relying on a 6-month-old blog post (including this one).

Multi-region AWS debugging has a real operational cost

Lambda@Edge functions execute in the edge region closest to the user — not us-east-1, not wherever you deployed from. CloudWatch logs for those executions land in the regional log group of wherever the request was served. If a user in Tokyo hits an error, the logs are in ap-northeast-1, not your home region. I've watched teams spend 45 minutes on a bug that took 3 minutes to fix once they found the right log group. If your team isn't already comfortable switching regions in the AWS console mid-incident and correlating X-Ray traces across region boundaries, factor that learning curve into your estimate. This isn't a knock on the architecture — it's just a real operational cost that doesn't show up in any "getting started" guide.

FAQ

Why is CloudFront passing the wrong Host header to my Vercel origin?

This is the most common issue I see, and it trips people up because Vercel is unusually strict about the Host header. By default, CloudFront forwards your distribution's domain (d1abc123.cloudfront.net) as the Host, and Vercel will respond with a 404 or redirect loop because that hostname isn't assigned to your project. You need to either set a custom origin header in CloudFront to force the correct Vercel hostname, or handle it in Lambda@Edge. The Lambda@Edge approach gives you more control:

// origin-request Lambda@Edge
exports.handler = async (event) => {
  const request = event.Records[0].cf.request;

  // Vercel rejects requests where Host doesn't match your deployment
  request.headers['host'] = [{
    key: 'Host',
    value: 'your-project.vercel.app' // or your custom domain assigned in Vercel
  }];

  return request;
};

If you're using a custom domain in Vercel (the right move for production), set that as the Host value, not the .vercel.app address. The .vercel.app hostname works, but Vercel rate-limits it aggressively in ways they don't document publicly.

Lambda@Edge keeps getting "The Lambda function result failed validation" — what does that mean?

This error surfaces when your Lambda returns a response object that violates CloudFront's strict schema requirements. The things that silently break it: header values must be arrays of objects ([{ key: 'X-Header', value: 'foo' }]), not plain strings. Header keys must be lowercase. The status field must be a string, not a number. I've burned time on all three of these. CloudFront doesn't tell you which field is wrong — you get the generic validation error and have to bisect your response object manually.

// WRONG — will fail validation silently
return {
  status: 200,                          // must be "200"
  headers: {
    'Content-Type': 'text/html'         // must be array of objects
  }
};

// CORRECT
return {
  status: '200',
  headers: {
    'content-type': [{ key: 'Content-Type', value: 'text/html' }]
  },
  body: '<h1>ok</h1>'
};

Why is my Lambda@Edge function deploying fine but changes aren't taking effect?

Lambda@Edge has a propagation delay that's separate from CloudFront's cache invalidation. When you update a Lambda@Edge function and publish a new version, CloudFront takes 5–15 minutes to pick up that new version across all edge nodes. There's no progress indicator — you just wait. What makes this worse: if you're testing in a browser, you might also be hitting a cached CloudFront response that masks whether the Lambda change landed at all. Use curl with a cache-busting query string and check the X-Cache response header to confirm whether you're hitting the origin or a cached edge response.

curl -I "https://your-distribution.cloudfront.net/test?bust=$(date +%s)" \
  -H "Cache-Control: no-cache"
# Look for: X-Cache: Miss from cloudfront (means origin was hit)
# vs:       X-Cache: Hit from cloudfront (means you're seeing cached response)

Vercel is returning 308 redirects that create an infinite loop through CloudFront — how do I stop it?

Vercel automatically redirects non-www to www (or vice versa) and HTTP to HTTPS based on your project's domain config. When CloudFront sits in front, those 308s come back to CloudFront, which may follow them or pass them to the client, creating a redirect chain. The fix is two-pronged: make sure your CloudFront behavior is set to redirect HTTP to HTTPS at the CloudFront layer before requests hit Vercel, and disable any conflicting redirect rules in your Vercel project settings under Domains. Also check that you're not forwarding the X-Forwarded-Proto header inconsistently — Vercel uses it to decide whether to issue an HTTPS redirect.

Why does my Lambda@Edge function work in us-east-1 but fail at the edge?

Lambda@Edge functions are replicated from us-east-1 to edge locations, but the execution context at the edge is more constrained. The limits that catch people off guard: 128MB memory max for viewer-facing events (request/response), 1MB response body limit on viewer events, and — the one that actually burned me — no environment variables. Lambda@Edge strips them entirely. Any config you're pulling from process.env at edge will be undefined. You have to bake config into the function code itself, or fetch it from SSM/Secrets Manager at cold start, which adds latency.

# Lambda@Edge hard limits (as of 2024, verify in AWS docs for your event type)
# Viewer request/response: 128MB memory, 5s timeout, 1MB response size
# Origin request/response: 128MB memory, 30s timeout, 1MB response size
# No env vars. No Lambda layers with dynamic config. No VPC access.

CloudFront is caching Vercel's error pages and now all my users see a stale 500 — how do I prevent this?

CloudFront will cache any response Vercel returns, including 4xx and 5xx responses, if you haven't explicitly told it not to. Set a separate cache behavior for error responses with a very short TTL (I use 5 seconds) in CloudFront's Error Pages configuration. More importantly, configure your origin to return Cache-Control: no-store on error responses — Vercel doesn't do this by default for serverless function errors. You can also intercept error responses in a Lambda@Edge origin-response handler and strip or rewrite cache headers before CloudFront stores them.

// origin-response Lambda@Edge — prevent caching of error responses
exports.handler = async (event) => {
  const response = event.Records[0].cf.response;
  const status = parseInt(response.status, 10);

  if (status >= 400) {
    response.headers['cache-control'] = [{
      key: 'Cache-Control',
      value: 'no-store, max-age=0'
    }];
  }

  return response;
};

Disclaimer: This article is for informational purposes only. The views and opinions expressed are those of the author(s) and do not necessarily reflect the official policy or position of Sonic Rocket or its affiliates. Always consult with a certified professional before making any financial or technical decisions based on this content.

Originally published on techdigestor.com. Follow for more developer-focused tooling reviews and productivity guides.

DEV Community