Canary deployments provide a powerful risk mitigation strategy for software releases by gradually rolling out changes to a small subset of users before wider distribution. Like the historical practice of using canaries in coal mines to detect danger, this deployment pattern helps teams identify potential issues early while limiting the blast radius of problematic releases. By routing a controlled percentage of traffic to the new version, teams can monitor key metrics, gather user feedback, and confidently roll back changes if problems arise - all while maintaining system stability for the majority of users.
Out-of-the-box Solutions
Cloud providers offer many built-in deployment strategies, but they often fall short when managing dependencies between application layers. For example, when deploying frontend changes that depend on new backend endpoints, you need to ensure users receive consistent versions across every layer. While built-in cloud canary tools are powerful, they typically can't unify versions of multiple layers to guarantee that users get the same version throughout their experience. Implementing gradual rollouts while maintaining synchronization can be a powerful tool, especially for teams with limited DevOps resources.
The approach outlined in this guide will likely need adaptation for your specific architecture and stack, but the principles can be applied across different projects and cloud providers. I'll provide an overview of the solution here, with deep dives into each component in future posts.
Basic overview:
Set up your deployment control-center
Canary deployments require running two versions of your application components simultaneously. Your CI/CD pipeline needs to know which environment should receive new changes and which should remain stable. For a smooth experience, you need a single source of truth defining which environment receives changes (the "active" environment) while the other (the "fallback" environment) remains untouched by your pipelines. We've implemented our SST using an SSM parameter.Prepare your DB
While teams with substantial resources might spin up separate databases for each version, this approach can be expensive and introduces synchronization risks that canary deployments aim to avoid. Instead, we've adopted an expand and contract pattern, allowing multiple application versions to work simultaneously with a single database. We run the expand step before initiating our canary release and execute the contract step after completing traffic migration.Configure your backend
Running a successful canary deployment requires maintaining two independent versions of your backend. We accomplish this using Lambda versions and aliases. Each Lambda function has two aliases, one per environment. When code is merged, our pipelines deploy a new Lambda version and update the active environment's alias while the fallback alias remains on the previous version. Synchronization simply requires updating aliases to point to the newer version.Split your API
Your API needs to route requests to the correct backend version. In AWS, we implement this through API Gateway by deploying separate stages for each environment. Each stage gets its own base path mapping and connects to its corresponding Lambda aliases. While we handle this through custom pipeline logic, using stage variables is equally effective.Make your front end drive the version
The key to unified versioning is having users declare their stack version from the start. During frontend builds, we inject environment-specific API URLs. When users load your application, the frontend connects to its designated API stage, which invokes the appropriate Lambda functions, creating a consistent version chain throughout the stack.Route traffic based on canary rules
With separate environments established, you need to implement traffic splitting according to your canary rules. This requires using cookies to maintain version consistency across user sessions - you don't want users getting different versions on page reloads. We use Lambda@Edge functions in our CloudFront distributions to route traffic between S3 buckets hosting different versions.Deployment tools to make it easy
A mechanism to gradually increase traffic to new versions on a schedule and a synchronization tool for aligning versions before new releases. The sync tool requires special attention. Consider this scenario: you release v54 with frontend changes to env A, then v55 with backend-only changes to env B. Weeks later, you're ready to release v56 with frontend changes to env A. The challenge is that env A's backend never received v55's updates, and without backend changes in v56, you need a way to ensure the environment incorporates all recent updates. This becomes especially important in microservice architectures where services update at different frequencies. Before new deployments, verify that your target environment is synchronized with the latest changes across all components. For us, this means building our API and frontend, then running a script to update all necessary Lambda aliases.
This overview should provide a foundation for implementing canary deployments in your own system.
๐ Drop a comment with what changes are needed to make it work for you.
๐ Hit follow for future deep dives into each step.
Top comments (0)