DEV Community

Matia Rašetina
Matia Rašetina

Posted on

The Complete Guide to AWS Lambda Aliases, Versions, and Canary Deployments (With CDK Examples)

Deploying a new Lambda code to your AWS environment shouldn’t be stressful at all, but for some teams, it is. If you ever pushed a quick fix to production and straight away went to CloudWatch logs to see if the Lambda is failing, you know the feeling. When updating the Lambda code, it happens instantly. If there is a bug in the new code, every user feels it straight away.

That’s why tools like Lambda Versions, Aliases and Canary Deployments come in. When used correctly, they give you a way of rolling out new code changes gradually, observing the impact and automatically starting the roll back process if something happens.

No downtime. No fire drills. No late-night debugging sessions.

In this guide, you'll learn:

  • What Lambda versions and aliases actually do
  • How traffic routing works
  • How CodeDeploy turns aliases into safe, automated canary releases
  • How to implement everything with AWS CDK
  • The pitfalls and best practices you should know

This blog post is a continuation of a previous post about CRUD Lambdas with PynamoDB. Here, we are going to enable versioning, alias and deployment for CreatePropertyLambda , however the full code for all CRUD Lambdas is available on this link.

Why Safe Lambda Deployments Matter

Most of the Lambda deployments follow the most basic flow:

  1. a developer deploys the code via CDK, SAM or any other tool
  2. AWS takes the new snapshot of the incoming code
  3. AWS switches all traffic to the new deployed Lambda

It’s fast, however it makes it very dangerous for the stability of the application, as you don’t know if there will be any errors which went unnoticed during testing.

With aliases and canary routing, you shift from “everything at once” to:

  • Send 10% of traffic to the new version
  • Watch for errors
  • Promote or roll back automatically

This gives you:

  • Zero downtime deployments
  • Automatic recovery when something breaks
  • Gradual exposure to real production traffic
  • Safety without slowing delivery

Let’s take a deeper dive into the basics and understand them before continuing to the CDK code.

Lambda Versions, Aliases and Routing — The Essentials

Before we begin, let’s go over some base points, which are need-to-know.

Versions

A version is an immutable snapshot of that code, meaning once deployed, it will forever have that version number and it can only be updated by another version.

One Lambda version contains multiple parts which are vital for Lambda runtime:

  • source code
  • layers and dependencies
  • environment variables
  • runtime settings, like Lambda timeout

This immutability is why versions are the backbone of safe deployments.

Aliases

An alias is a named pointer — like prod, beta, or v1 — that points to one or more Lambda versions.

It is very important to understand — API Gateway, Step Functions, EventBridge rules or any other AWS service should call your alias, not the version directly, because aliases enable us to decouple deployments from references, allow traffic shifting and rollback and aliases prevent needing to update API Gateway integrations.

Weighted Routing + Why use Canary Deployments?

AWS lets you attach a routing configuration to an alias — for example, we deploy a new Lambda version and we can route the traffic to both old and new versions:

  • 90% of traffic → version 1
  • 10% of traffic → version 2

This simple mechanism is the foundation of canary deployments.

We use canary deployments to route a fraction of the production traffic to the newly deployed resource. If for any reason the new version of the resource fails, only a small slice of users is impacted and the system will recognize the errors and re-route the experimental users back to the fully working code.

There are other benefits to using canary deployments:

  • Zero downtime
  • Safe experimentation with new logic or dependencies
  • Easy rollback by flipping the alias back

For more information how the canary deployment works, please check out my blog post about it, you can see it by clicking the link here.

How Lambda Aliases Enable Canary Releases

The full canary release process is as following:

  • new Lambda code gets deployed → AWS automatically creates a new version, let’s call it version 2
  • Lambda alias prod is pointing to version 1
  • prod alias gets updated to route the traffic:
    • 90% of the traffic → version 1
    • 10% of the traffic → version 2
  • CodeDeploy and CloudWatch are working together to monitor the incoming traffic to version 2
    • if anything breaks and errors spike, CloudWatch sends the notification to CodeDeploy to stop the deployment process and to reroute all traffic to the older version (in this case, version 1)
    • if everything looks good, shift the traffic from version 1 to version 2 and version 2 becomes a de facto main carrier of prod alias traffic

With this approach, users experience no downtime and you can be assured that in case of failure, your system is still online!

Overview of the API Gateway and AWS Lambdas with an Alias

Setting Up Canary Deployments with AWS CDK

Step 1: Enable Lambda versioning

You want Lambda to publish a new version only when the code changes*.*

A common trick: embed a hash into the function description.

# Method used to get the hash of the Lambda, so we confirm if
# the code changed at all.
# No need to start a canary deployment if the code didn't change
def _get_code_hash(self, directory: str) -> str:
        """Generate a hash of all files in the Lambda code directory."""
        hash_md5 = hashlib.md5()
        for root, dirs, files in sorted(os.walk(directory)):
            for file in sorted(files):
                if file.endswith(('.py', '.json', '.txt', '.yaml', '.yml')):
                    file_path = os.path.join(root, file)
                    with open(file_path, 'rb') as f:
                        hash_md5.update(f.read())
        return hash_md5.hexdigest()[:8]

# Lambda functions with code hash in description for versioning
create_property_code_dir = "./CreatePropertyHandler"
create_code_hash = self._get_code_hash(create_property_code_dir)

create_property_fn = _lambda.Function(
    self, "CreatePropertyLambda",
    runtime=_lambda.Runtime.PYTHON_3_12,
    handler="handler.handler",
    code=_lambda.Code.from_asset(create_property_code_dir),
    layers=[layer],
    description=f"v-{create_code_hash}",  # Hash in description triggers new version only when code changes
    environment={
        "PROPERTY_TABLE_NAME": self.property_table.table_name,
        "PROPERTY_BUCKET": self.property_bucket.bucket_name,
        "ALLOWED_ORIGIN": "*"
    },
    timeout=Duration.seconds(30)
)

create_version = create_property_fn.current_version
Enter fullscreen mode Exit fullscreen mode

Step 2 — Create a Lambda Alias

create_alias = _lambda.Alias(
    self, "CreatePropertyProdAlias",
    alias_name="prod",
    version=create_version,
)
Enter fullscreen mode Exit fullscreen mode

This is the entry point of accessing the CreateProperty Lambdas which will be deployed.

Step 3 — Creating a CloudWatch Alarm

This alarms will trigger a rollback if the function is reporting any errors:

create_error_alarm = cloudwatch.Alarm(
    self, "CreatePropertyErrorAlarm",
    metric=create_property_fn.metric_errors(
        period=Duration.seconds(30),
        statistic="Sum"
    ),
    threshold=1,
    evaluation_periods=2,
    datapoints_to_alarm=2,
    alarm_name="create-property-canary-errors"
)
Enter fullscreen mode Exit fullscreen mode

A very useful feature in CloudWatch is that you can monitor not just Lambda errors, but also:

  • latency
  • throttles
  • create metrics of your own - you can read more about custom metrics by clicking on the link here

Step 4 — Adding a Canary Deployment process with CodeDeploy

codedeploy.LambdaDeploymentGroup(
    self,
    "CreatePropertyCanaryDeployment",
    alias=create_alias,
    deployment_config=codedeploy.LambdaDeploymentConfig.CANARY_10_PERCENT_5_MINUTES,
    alarms=[create_error_alarm],
    auto_rollback=codedeploy.AutoRollbackConfig(
        failed_deployment=True,
        stopped_deployment=True,
    ),
)

Enter fullscreen mode Exit fullscreen mode

The configuration I’ve used is:

  • route 10% of traffic to the new version for 5 minutes
    • if no errors, shift to 100% instantly
    • if errors, roll back to the previous version

When you start your deployment via cdk deploy , if you go to your AWS Console and go to CodeDeploy, you will see the following interface and see how your deployment is going:

Example of the Canary Deployment of a Lambda

I’ve written about the deployment configuration in my other blog post which focuses more on Canary Deployments. If you are interested in more details, please check out my other post by clicking on the link here.

Best Practices and Lessons Learned

A couple of lessons and best practices I’ve learned:

  • use aliases for everything in production
    • never point to a specific Lambda version, only aliases
  • canary deployments don’t work for all triggers
    • SQS, Kinesis and DynamoDB streams are not supported — canaries can only apply where AWS can split traffic, like API Gateway and Application Load Balancer
  • versioning increases storage
    • old Lambda code versions can accumulate, so consider lifecycle cleanup of unused Lambda versions
  • never forget about cold starts
    • it’s very useful to have a latency alarm during deployment, however, keep in mind that Lambda’s cold starts + regular execution time can trigger the alarm too!

Conclusion

Your Lambda deployments don’t have to be risky. By combining all of the points we’ve went over in this blog post, like versions, aliases and canary deployments, you can get a predictable, reversible and fully automated rollouts — with zero downtime!

In a production-grade system, implementation of this pattern should be a no-brainer.

Top comments (0)