DEV Community

James Lee
James Lee

Posted on

Traffic Routing in AWS Lambda: Canary Deployments, Weighted Aliases & Blue/Green

Deploying a new version of a Lambda function sounds simple — upload code, done. But in production, you never want 100% of traffic hitting an untested version simultaneously.

How does Lambda route traffic between versions? How do you do a canary release that shifts 5% of traffic to a new version and automatically rolls back on errors? How does async traffic flow differently from sync traffic?

This article covers Lambda's traffic routing model from the inside out.


Lambda Versions and Aliases: The Foundation

Before traffic routing makes sense, you need to understand Lambda's versioning model.

Versions

Every time you publish a Lambda function, AWS creates an immutable version — a snapshot of your code and configuration at that point in time.

$LATEST  →  always points to the latest unpublished code (mutable)
:1       →  first published version (immutable)
:2       →  second published version (immutable)
:3       →  third published version (immutable)
Enter fullscreen mode Exit fullscreen mode
# Publish a new version via boto3
import boto3

lambda_client = boto3.client('lambda')

response = lambda_client.publish_version(
    FunctionName='brand-api',
    Description='v2.1.0 — faster logo lookup with DynamoDB cache'
)

version_arn = response['FunctionArn']
version_number = response['Version']
print(f'Published version {version_number}: {version_arn}')
# → Published version 42: arn:aws:lambda:us-east-1:123:function:brand-api:42
Enter fullscreen mode Exit fullscreen mode

Versions are immutable — you cannot change the code of :42 after it's published. This is the foundation of safe deployments.

Aliases

An alias is a named pointer to a specific version. Your API Gateway, EventBridge rules, and other triggers should always point to an alias — never to $LATEST or a version number directly.

brand-api:prod   →  points to :42  (production traffic)
brand-api:staging →  points to :43  (staging traffic)
brand-api:canary  →  points to :42 (95%) + :43 (5%)  ← weighted routing
Enter fullscreen mode Exit fullscreen mode
# Create or update an alias
lambda_client.create_alias(
    FunctionName='brand-api',
    Name='prod',
    FunctionVersion='42',
    Description='Production alias'
)

# Update alias to point to new version
lambda_client.update_alias(
    FunctionName='brand-api',
    Name='prod',
    FunctionVersion='43'
)
Enter fullscreen mode Exit fullscreen mode

Traffic Splitting: Canary Deployments with Weighted Aliases

The most powerful traffic routing feature in Lambda is weighted aliases — you can split traffic between two versions with any percentage split.

brand-api:prod
├── version :42  →  95% of traffic
└── version :43  →  5% of traffic  ← canary
Enter fullscreen mode Exit fullscreen mode

This is Lambda's equivalent of what Knative achieves with Istio VirtualService traffic splitting — but built natively into the Lambda service.

Implementing a Canary Deployment

# deploy_canary.py
import boto3
import time

lambda_client = boto3.client('lambda')
cloudwatch = boto3.client('cloudwatch')

def deploy_canary(function_name: str, new_version: str, canary_percent: int = 5):
    """
    Deploy a new Lambda version as a canary.
    Routes canary_percent% of traffic to new version.
    """
    # Get current prod alias
    alias = lambda_client.get_alias(
        FunctionName=function_name,
        Name='prod'
    )
    current_version = alias['FunctionVersion']

    print(f'Current prod version: {current_version}')
    print(f'Deploying canary: version {new_version} at {canary_percent}%')

    # Update alias with weighted routing
    lambda_client.update_alias(
        FunctionName=function_name,
        Name='prod',
        FunctionVersion=current_version,       # stable version gets majority
        RoutingConfig={
            'AdditionalVersionWeights': {
                new_version: canary_percent / 100  # e.g., 0.05 = 5%
            }
        }
    )

    print(f'Canary deployed: {100 - canary_percent}% → v{current_version}, '
          f'{canary_percent}% → v{new_version}')


def promote_canary(function_name: str, new_version: str):
    """Promote canary to 100% — full deployment"""
    lambda_client.update_alias(
        FunctionName=function_name,
        Name='prod',
        FunctionVersion=new_version,
        RoutingConfig={
            'AdditionalVersionWeights': {}  # clear weighted routing
        }
    )
    print(f'Canary promoted: 100% traffic now on version {new_version}')


def rollback_canary(function_name: str, stable_version: str):
    """Roll back — remove canary, restore 100% to stable version"""
    lambda_client.update_alias(
        FunctionName=function_name,
        Name='prod',
        FunctionVersion=stable_version,
        RoutingConfig={
            'AdditionalVersionWeights': {}  # clear canary
        }
    )
    print(f'Rolled back: 100% traffic restored to version {stable_version}')


# Usage
deploy_canary('brand-api', new_version='43', canary_percent=5)
Enter fullscreen mode Exit fullscreen mode

Automated Canary with CloudWatch Alarms (CodeDeploy)

Manually managing canary percentages is error-prone. AWS CodeDeploy integrates with Lambda to automate the shift — and automatically roll back if CloudWatch alarms fire.

# serverless.yml — automated canary deployment
provider:
  name: aws
  deploymentMethod: direct

functions:
  brandApi:
    handler: handler.handler
    deploymentSettings:
      type: Canary10Percent5Minutes   # shift 10% now, 100% after 5 minutes
      alias: prod
      alarms:
        - BrandApiErrorRateAlarm      # rollback if this alarm fires
        - BrandApiLatencyAlarm
Enter fullscreen mode Exit fullscreen mode
# CloudFormation — define the rollback alarms
resources:
  Resources:
    BrandApiErrorRateAlarm:
      Type: AWS::CloudWatch::Alarm
      Properties:
        AlarmName: brand-api-error-rate-canary
        MetricName: Errors
        Namespace: AWS/Lambda
        Dimensions:
          - Name: FunctionName
            Value: brand-api
          - Name: Resource
            Value: brand-api:prod   # monitor the alias, not a specific version
        Statistic: Sum
        Period: 60
        EvaluationPeriods: 2
        Threshold: 5                 # rollback if >5 errors in 2 minutes
        ComparisonOperator: GreaterThanThreshold

    BrandApiLatencyAlarm:
      Type: AWS::CloudWatch::Alarm
      Properties:
        AlarmName: brand-api-p99-latency-canary
        MetricName: Duration
        Namespace: AWS/Lambda
        Dimensions:
          - Name: FunctionName
            Value: brand-api
          - Name: Resource
            Value: brand-api:prod
        ExtendedStatistic: p99
        Period: 60
        EvaluationPeriods: 2
        Threshold: 2000              # rollback if P99 > 2000ms
        ComparisonOperator: GreaterThanThreshold
Enter fullscreen mode Exit fullscreen mode

CodeDeploy deployment types for Lambda:

Type Behavior
AllAtOnce 100% traffic shifts immediately (no canary)
Canary10Percent5Minutes 10% for 5 min, then 100%
Canary10Percent10Minutes 10% for 10 min, then 100%
Canary10Percent15Minutes 10% for 15 min, then 100%
Linear10PercentEvery1Minute +10% every minute until 100%
Linear10PercentEvery2Minutes +10% every 2 minutes until 100%

How Traffic Flows: Sync vs Async

Traffic routing in Lambda isn't just about version weights — the entire flow differs between synchronous and asynchronous invocations.

Synchronous Traffic Flow (API Gateway)

Client Request
     │
     ▼
API Gateway
     │  (points to alias: brand-api:prod)
     ▼
Lambda Service (weighted routing)
     ├── 95% → Execution Environment running v42
     └──  5% → Execution Environment running v43
     │
     ▼
Response returned to API Gateway → Client
Enter fullscreen mode Exit fullscreen mode

Key characteristics:

  • Direct path: client waits for the response
  • No buffering: if Lambda is throttled, API Gateway immediately returns 429 to the client
  • Version routing: Lambda's weighted alias determines which version handles each request
# handler.py — use context to log which version is handling the request
import os

def handler(event, context):
    # Log version info for canary monitoring
    function_version = context.function_version
    print(f'Handled by version: {function_version}')

    # Your business logic
    brand_id = event['pathParameters']['brandId']
    return get_brand(brand_id)
Enter fullscreen mode Exit fullscreen mode

Asynchronous Traffic Flow (SQS / EventBridge)

Async traffic introduces a buffer layer between the event source and Lambda execution. This is the key architectural difference.

Event Source (S3 upload / EventBridge rule)
     │
     ▼
Lambda Internal Queue  ← traffic is buffered here
     │
     ▼  (Lambda polls the queue)
Lambda Service (weighted routing)
     ├── 95% → Execution Environment running v42
     └──  5% → Execution Environment running v43
     │
     ▼
Result → CloudWatch Logs
      → Success destination (SNS/SQS/EventBridge/Lambda)
      → Failure destination (DLQ) on repeated failures
Enter fullscreen mode Exit fullscreen mode

The buffer is critical: it decouples the event producer from Lambda's availability. If Lambda is throttled or scaling out, events queue up and are processed when capacity is available — nothing is dropped.

# handler.py — async handler with destination routing
import json
import boto3

def handler(event, context):
    """
    Async handler — processes S3 upload events.
    On success: result routed to success-destination SQS.
    On failure: after 2 retries, routed to DLQ.
    """
    for record in event['Records']:
        bucket = record['s3']['bucket']['name']
        key = record['s3']['object']['key']

        try:
            result = process_brand_asset(bucket, key)
            print(f'Successfully processed: {key}')
            return {'processed': key, 'result': result}

        except Exception as e:
            print(f'Failed to process {key}: {e}')
            raise  # re-raise to trigger Lambda retry + eventual DLQ routing
Enter fullscreen mode Exit fullscreen mode
# serverless.yml — configure async destinations
functions:
  processBrandAsset:
    handler: handler.handler
    destinations:
      onSuccess: arn:aws:sqs:us-east-1:123:brand-asset-success
      onFailure: arn:aws:sqs:us-east-1:123:brand-asset-dlq
    maximumRetryAttempts: 2
    events:
      - s3:
          bucket: brand-assets
          event: s3:ObjectCreated:*
Enter fullscreen mode Exit fullscreen mode

Concurrency Control at the Traffic Layer

In Knative's model, the queue-proxy sidecar acts as a per-pod concurrency limiter — it queues excess requests locally before forwarding to the user container, and reports metrics to the autoscaler.

AWS Lambda implements an equivalent mechanism natively, without requiring a sidecar:

Per-Function Concurrency Limiting

# Set maximum concurrency — Lambda queues excess async requests
lambda_client.put_function_concurrency(
    FunctionName='brand-logo-processor',
    ReservedConcurrentExecutions=50  # max 50 simultaneous executions
)
Enter fullscreen mode Exit fullscreen mode

For synchronous invocations: requests beyond the concurrency limit are immediately throttled (429).

For asynchronous invocations: requests beyond the concurrency limit are queued in Lambda's internal event queue (up to 6 hours) and retried as capacity becomes available.

Per-Alias Concurrency (Provisioned Concurrency on Aliases)

You can apply Provisioned Concurrency specifically to an alias, ensuring the production alias always has warm environments while the canary alias uses on-demand scaling:

# Apply provisioned concurrency to prod alias only
lambda_client.put_provisioned_concurrency_config(
    FunctionName='brand-api',
    Qualifier='prod',          # the alias name
    ProvisionedConcurrentExecutions=20
)

# Canary alias uses on-demand (may cold start, but that's acceptable for 5% traffic)
# No provisioned concurrency set on 'canary' alias
Enter fullscreen mode Exit fullscreen mode

Blue/Green Deployment Pattern

For changes that are too risky for gradual canary (e.g., breaking schema changes), use a full blue/green deployment:

Blue environment:  brand-api:prod  → version :42  (100% traffic)
Green environment: brand-api:green → version :43  (0% traffic, fully tested)

After validation:
Blue environment:  brand-api:prod  → version :43  (100% traffic, instant cutover)
Green environment: brand-api:green → version :42  (kept for instant rollback)
Enter fullscreen mode Exit fullscreen mode
# blue_green_deploy.py
import boto3

lambda_client = boto3.client('lambda')

def blue_green_cutover(function_name: str, new_version: str):
    """
    Instant traffic cutover from current prod version to new version.
    Previous version kept on 'previous' alias for instant rollback.
    """
    # Get current prod version (this becomes 'blue' / previous)
    current = lambda_client.get_alias(
        FunctionName=function_name,
        Name='prod'
    )
    current_version = current['FunctionVersion']

    # Preserve current version on 'previous' alias for rollback
    try:
        lambda_client.update_alias(
            FunctionName=function_name,
            Name='previous',
            FunctionVersion=current_version
        )
    except lambda_client.exceptions.ResourceNotFoundException:
        lambda_client.create_alias(
            FunctionName=function_name,
            Name='previous',
            FunctionVersion=current_version
        )

    # Cut over prod to new version (instant, no gradual shift)
    lambda_client.update_alias(
        FunctionName=function_name,
        Name='prod',
        FunctionVersion=new_version,
        RoutingConfig={'AdditionalVersionWeights': {}}
    )

    print(f'Cutover complete: prod now on v{new_version}')
    print(f'Rollback available: run rollback() to restore v{current_version}')


def instant_rollback(function_name: str):
    """Roll back to previous version instantly"""
    previous = lambda_client.get_alias(
        FunctionName=function_name,
        Name='previous'
    )
    previous_version = previous['FunctionVersion']

    lambda_client.update_alias(
        FunctionName=function_name,
        Name='prod',
        FunctionVersion=previous_version,
        RoutingConfig={'AdditionalVersionWeights': {}}
    )

    print(f'Rolled back: prod restored to v{previous_version}')
Enter fullscreen mode Exit fullscreen mode

Deployment Strategy Decision Guide

How risky is this deployment?
│
├── Low risk (config change, minor bug fix)
│   └── AllAtOnce — deploy directly to 100%
│
├── Medium risk (new feature, refactor)
│   └── Canary — start at 5–10%, monitor errors/latency,
│       auto-promote or rollback via CodeDeploy alarms
│
├── High risk (breaking change, new external dependency)
│   └── Blue/Green — full parallel environment,
│       instant cutover after validation, instant rollback
│
└── Schema/data migration (irreversible changes)
    └── Feature flags in code + gradual rollout
        (decouple deployment from feature activation)
Enter fullscreen mode Exit fullscreen mode

Summary

Concept AWS Lambda Implementation
Traffic splitting Weighted aliases (e.g., 95% v42 / 5% v43)
Canary deployment CodeDeploy + Lambda aliases + CloudWatch alarms
Blue/Green Two aliases pointing to different versions, instant cutover
Async traffic buffering Lambda internal event queue (up to 6 hours)
Concurrency control Reserved concurrency + Provisioned Concurrency per alias
Automatic rollback CodeDeploy monitors alarms, rolls back if threshold breached

The key insight: Lambda's alias + versioning system is its traffic routing layer. Every production Lambda function should be invoked via an alias — never via $LATEST. This single practice unlocks canary deployments, blue/green releases, and instant rollbacks.


Next in this series: **Part 5 — Event-Driven Automation: Building a Serverless Maintenance Bot with Lambda & EventBridge**

Top comments (0)