DEV Community

Cover image for AWS re:Invent 2025 - Accelerate software delivery with Amazon ECS (CNS315)
Kazuya
Kazuya

Posted on • Edited on

AWS re:Invent 2025 - Accelerate software delivery with Amazon ECS (CNS315)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Accelerate software delivery with Amazon ECS (CNS315)

In this video, Kevin Gibbs and Mike Rizzo explain Amazon ECS advanced deployment strategies including blue-green, canary, and linear deployments. They demonstrate how these strategies enable faster rollbacks by maintaining both old and new task sets simultaneously, unlike rolling deployments. Using Unicorn Watch as a case study, they show implementations across Application Load Balancer, Service Connect, Network Load Balancer, and headless services. Key features covered include lifecycle hooks for custom testing, traffic shifting configurations, alarm-based rollbacks, and bake time settings. They also discuss migration paths from CodeDeploy to ECS deployment controllers and considerations for choosing between deployment strategies based on specific use cases.


; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Thumbnail 0

Introduction to Amazon ECS Deployments and Advanced Deployment Strategies

Hello. I love software deployments. I love software deployments because it's how I deliver value to you, my customers, and I see that you're here today. You probably also love deploying software to bring value to your customers. My name is Kevin Gibbs and I'm here with my colleague Mike Rizzo, and we're going to talk through Amazon ECS deployments and some of the advanced deployment strategies that we've delivered this year.

Thumbnail 40

Just a rundown of what we're going to go through today. First, we're going to talk through a primer of ECS services and deployments. Mike is going to go through a deep dive on advanced deployment techniques, configuration, and what exactly we mean when we say advanced deployments. Finally, we're going to go through choosing and migrating between different deployment strategies or to advanced deployments from rolling. We'll also cover some key takeaways and some other ways that you can finish and continue your learning process.

Thumbnail 70

So first off, when we talk about ECS services, we're really talking about two different configuration parameters or sets of parameters. The first one is your task definition. A task definition defines a unit of work, a single task within a service, and then service configuration and those parameters say how you want to deploy one or more of those tasks to perform some function. Those two things together, that combination, actually come down and result in a service revision, which is what we call it in Amazon ECS, and so you can actually look at how your service changes over time by looking through your revision history.

Thumbnail 130

The process from going from either nothing to your first service revision or from a service revision to your next service revision is an ECS deployment, and that process is what we're going to go over today, what it does, how it works, and some of the configuration of it. So to get into some of the configuration for your deployments, the first thing that we look at is the deployment controller. Today we're going to be focusing on the ECS deployment controller, our native controller that we manage for you. There are two other options that you have. One is you can use CodeDeploy, which also supports some blue-green, linear, and canary, as well as external. What external means is really it's up to you. You own the deployments. You can do it however you want, but you are taking on that responsibility. So that is another option. Again, we're going to focus on ECS going forward.

Thumbnail 160

So the next thing that you need to decide is what strategy. Now that you have your controller decided on ECS, we have four strategies. We have our default strategy, which is rolling, where you're deploying your new service alongside your old revision, and you're just migrating via the creation and deletion of tasks. In contrast to that, you also have three new advanced deployment controls, or deployment strategies, sorry. These are different in how traffic is migrated, but all of them allow you to have a new set of tasks and an old set of tasks sitting together to allow you to roll back and roll forward much more quickly and seamlessly between versions.

Thumbnail 200

Some of the differences that you'll notice between standard rolling and advanced deployments is what your task capacity looks like. In a standard rolling deployment, you're actually going to create new tasks and destroy and delete or stop old tasks, and so you kind of have more of a constant capacity. Whereas in advanced deployments, through the lifetime of your deployment, you're actually going to have double the number of tasks. You're going to have your blue task and your green task, if you want to use the color analogy. And so you have, again, twice for that duration, and then you go back down to normal once the deployment is over.

Thumbnail 240

The other aspect, the key difference, is how traffic is shifted or transitions between the two. In rolling deployments, you have, again, blue tasks and green tasks running right next to each other, old and new, and so the total traffic that goes to either one is basically dependent on how many tasks are up and healthy. In the advanced deployments, you have additional functionality of how that traffic is shifted. In blue-green, what that's really doing is taking 100% from blue to green in one step. We just do it once. Canary is kind of a two-step process, so you get to choose the initial percentage. In this case, I'm showing 10% is the initial transition, and so you transition 10% of your traffic. Then you can check metrics, make sure that everything's looking good, and then it'll transition 100%. Linear is an n-stage version, and so in this one you have increments, and so you can start with as small as 3%, which would result in about 33 stages to your deployment, but it does traffic shifting over time.

Thumbnail 320

Some of the reasons that you would want to do that is if you're concerned about performance of the new version and you want to actually gradually shift traffic and see how it's performing or something like that. Some of the common configuration that you have between all of the ECS strategies are lifecycle hooks. This is something new that we added this year, and it's the ability for you to engage in the deployment process. You can define your lambdas of, hey, after you've scaled up my new tasks, I want to do something, I want to check something or whatever. You can also implement your own traffic shifting paradigm via lifecycle hooks as well. And then also we have the idea of a bake time. So if you're using alarm-based rollbacks, which we've had for a little while, if you want to just bake it with that new version serving traffic for an hour or a day, you have that option with the bake time.

Thumbnail 370

Finally, there's some specific configuration for the different strategies. The first one is with rolling, which is kind of how those increments grow and shift over time, of what is your total max capacity in rolling, and then also what's the minimum capacity. So you can actually have it scale down some tasks before it starts new tasks, or start new tasks before scaling down old tasks, configured using those two things. Blue/green, there's really nothing specific. You can set up the lifecycle hooks and you can set up your bake time, but it's pretty much just a single shift, so there's nothing specific. With canary, you can control that initial shift and you can go as small as 0.1% all the way up to 99.9% to kind of control how much traffic is on your new version before you move everyone over to it. And then linear, similarly, each stage can be as small as 3% and as much as 99.9%. And again with those last two, you kind of have a time wait, kind of like a mini bake time between each of your shifts to control.

Thumbnail 470

Thumbnail 480

Meet Unicorn Watch: A Customer Journey with Advanced Deployments

Now I'm going to hand it off to Mike as he's going to deep dive into the different advanced deployment strategies. Thanks Kevin for that great intro to advanced deployments. My name's Mike, and as I say, I'm really excited about working with customers to help them derive real benefit from new features like advanced deployments. Over the next half hour or so, I'm going to go a bit deeper into how advanced deployments work and also look at some specific examples of how you can use them in a variety of different scenarios. But first, a little detour. I'd just like to share another passion of mine, unicorns. So at AWS we love unicorns. I'll let you know a little secret, some of us venture out at night looking to spot one of these amazing creatures, and here's one that a colleague of mine snapped last week. And the reason I'm mentioning this is because I'd like to take you through a journey that one of my favorite customers recently went through with advanced deployments.

Thumbnail 500

Thumbnail 540

Meet Ada, CTO of Unicorn Watch. So Unicorn Watch is the world's leading repository of unicorn sightings data, the go-to place for unicorn research. It's also home to the world's largest archive of unicorn sighting videos. And when Ada heard about advanced deployments being launched, she came to us and asked whether there was anything they could benefit from using advanced deployments to speed up their delivery velocity. So this is the architecture they had prior to advanced deployments coming along. They had a UI fronted with an ALB, and through this UI they were able, users were able to access the catalog of videos, to access their accounts, and to make purchases. When making a purchase, that would trigger the creation of an order fulfillment flow, and the flow could be monitored through another ALB by polling an interface on the orders service. All the interaction between the front end and the back end was through Service Connect.

And then they also had at the bottom a Network Load Balancer fronted streaming service that would pick up videos from a media asset bucket. And the way this works as well was that the orders would be creating videos that were watermarked with a unique identifier for security purposes. So this all worked nicely,

Thumbnail 610

and they were using ECS rolling deployments throughout, with one exception. In the case of the DUI, they wanted to use blue-green so they could manually check any new deployments in a safe green environment before flipping it over into the production environment. And for that, they use CodeDeploy.

Now you can imagine that a service like this is a premium service with a high price tag, and also those Unicorn photographers want to get their royalties. So it's really important that this platform maintains a high level of availability and reliability. At the same time, there's a growing demand for more new features, and also Ada wanted to expose the catalog service to third-party websites so that they could also resell videos from the catalog. So the challenge they faced was they wanted to maintain high availability and reliability, but at the same time, speed up their delivery velocity. So when advanced deployments came along, they were really keen to understand what that could do for them.

Thumbnail 690

The Lifecycle of an Advanced Deployment: From Blue to Green

So I'll come back to this story in a second, but first I just want to take you through the lifecycle of an advanced deployment in a bit more detail. In the initial state, we have a service running in production. We call it the Blue Environment. All traffic is going there, and when we start a new deployment, we have a new spec for the tasks that need to be run, including a new container image, for example. And we enter the deployment state machine in the first stage, which is known as Pre Scale Up. So this is before we scale up the green environment.

Now you'll see there's a little hook icon on the side there. That denotes that this is a hookable stage, which means you can attach a lifecycle hook to it. So with lifecycle hooks, you can run your own custom logic in a Lambda function, and you can use that to perform tests or checks that you could use to determine whether you want that stage to proceed, to succeed and proceed to the next stage, or whether you want to force a rollback. In this example, before you scale up the green environment, you might, for example, want to run some kind of admission control check where you might say, for example, is this container image coming from a trusted repo?

Thumbnail 780

Thumbnail 790

So if the hook returns successfully, you then proceed to the next stage, and this is where we do the scale up. So here we scale up the green environment. You can see there on the far side, we have the green environment scaling up. And once you achieve the full scale up, then we enter the Post Scale Up stage. At this point, all production traffic is still going to the blue environment. There's no traffic going to the green environment.

Thumbnail 810

Thumbnail 830

So once we've got the green environment in place, the next step is to start shifting test traffic to it. So we now enter the Test Traffic Shift stage, and here we configure routing so that test traffic, which has to be identified in some way, we'll come back to that later, that test traffic can start to flow to the green environment. Once that routing is set up, we enter the Post Test Traffic Shift stage. Now at this point, the green environment is able to start to receive test traffic.

This is a really convenient place to put in some hooks if you want to do some automated testing on the green environment, or if you want some kind of manual checks and manual approval on something running in the green environment. You can do that through the hook here, and we're going to see some examples of this with Unicorn Watch and see how they use it. So we're now at the position where the blue environment is still receiving all production traffic, and all test traffic is going to green.

Thumbnail 870

Once we exit the Post Test Traffic Shift successfully, so any lifecycle hooks have succeeded, we then start to shift the production traffic to the green environment. Now depending on the strategy you're using, this can happen in a number of different ways. The simplest is with blue-green. This happens all at once, so we shift all traffic in one step from blue to green. But you can also do canary, where you have it in two stages, or you can do it linear, which could be multi-step. We'll come back to that shortly. Once you've shifted the production traffic and you've got 100% going to green, we enter the Post Production Traffic Shift stage.

Thumbnail 910

Now at this point, there's no more production traffic going to blue. It's all going to green, but the blue environment is still running. We still have the old service revision running in the blue environment. This is useful because it means that if something was going wrong in the green environment, we can very quickly roll back and switch back to the blue one.

Thumbnail 930

Thumbnail 950

And we keep this blue environment running for a defined bake time. So as part of your configuration for your blue-green deployment, you can specify a bake time, and for the whole duration of the bake time, your blue environment will continue to exist. Once the bake time is over, the final step is clean up. At this point, we scale the blue down to zero, and we now have a situation where the green is our production environment and becomes the blue environment for the next deployment.

Canary and Linear Deployments: Traffic Shifting Strategies and Failure Detection

Each of these stages can take up to 24 hours, so you can do this over a pretty long period of time if you need to run long tests, for example. You'll also notice I was careful not to say too much about how the routing and the traffic shifting is actually happening, and that was deliberate because it depends on how the service is exposed. So for example, if the service is behind a load balancer, then it's achieved by manipulating weights on listener rules. But if the service is exposed to Service Connect, then the routing is achieved through changing the routing rules in the Service Connect proxy.

Thumbnail 1010

So in the case of canary deployments, everything's exactly the same as we saw in the lifecycle up until now. The only difference is you get two production traffic shift stages. So essentially, any hooks attached to production traffic shift, that hook will be invoked twice. First time will be for the initial canary shift, so in the first step, you take a percentage, in this case we've got 10%, but you can configure that, and we specify a canary bake time. And for the duration of that canary bake time, we will have that percentage of production traffic going to the green environment.

Thumbnail 1080

Once the canary bake time is over, production traffic shift is entered again, so we invoke the hook again if there is one. And then that completes the rest of the traffic shift to production. In the case of linear, it's similar, but we have multiple steps. So you can use canary and linear to achieve different objectives. In the case of canary, if you had a situation where you wanted to do some testing of your new service revision with actual production traffic, you can do that with canary, limiting your blast radius so that if things went wrong, it wouldn't hit you too badly.

Thumbnail 1120

In the case of linear, what it allows you to do is have a more gradual sort of rollout of the new service revision while you're monitoring to make sure that any performance and functionality is maintained correctly. Okay, really important that when using advanced deployment, you think about what would constitute a failure or underperformance of the new revision that would require you to stop the deployment and roll back. And ECS provides three broad mechanisms to support this.

First, the simplest is a circuit breaker. So with circuit breaker, you're simply checking that the new tasks actually achieve a healthy, stable state within a specified time period. Next, we've got CloudWatch alarms. So here you can choose some metrics that are appropriate for your use case and set alarms on those metrics, and if those alarms are triggered, then that can force a failure and a rollback of the deployment.

The metrics you can use here depends, for example, if you have an ALB fronted service, you might be looking at metrics for counts of 400 or 500 errors. If you had benchmarked the performance of previous revisions and you want to ensure that the performance of the new revision is consistent with what you had previously, you might want to be looking at things like CPU utilization or memory utilization. Very important that you factor in, if you're using canary or linear, you factor in that there's a mixture of the green and blue running together. So that is something you need to take account of in your alarm thresholds.

You can also think about situations where you have a task that's pulling messages off a queue, so there's no traffic actually being routed to the task, but instead, the task is pulling messages off a queue. In that case, you might want to be monitoring your queue length as a metric for alarming.

And then finally we've got custom tests and traffic monitoring, and this is where the lifecycle hooks I mentioned earlier come in. This is where you can implement your own tests in Lambda and insert them at the right places in the deployment lifecycle. And we're going to see some more examples of this with Unicorn Watch.

Thumbnail 1260

Unicorn Watch UI and Catalog: Implementing Blue-Green with Application Load Balancer

So let's go back to the Unicorn Watch architecture. Just a refresher, this is the architecture that they had prior to advanced deployments. And what I'm going to do now is just go through some aspects of this architecture in turn and see what changes they make and what they were able to achieve through that.

Thumbnail 1280

So first, we're going to talk about the UI and the catalog service. So they wanted to do a couple of things here. They wanted to switch to using canary deployments for the catalog, so they could try using the new function of catalog in a production environment with a limited blast radius. And they also wanted to expose the catalog on a public API so that third party websites could also access it.

Now, they couldn't do this sticking with CodeDeploy because first, CodeDeploy doesn't support Service Connect, so they couldn't use it with the catalog for canaries. But also, what they wanted to do to expose the catalog on a public API is they wanted to expose it on the same load balancer and on the same port. And to do that, they would need to use the load balancer's advanced request routing capabilities to use path-based routing.

Thumbnail 1340

With CodeDeploy, you can't do path-based routing, so by moving to ECS Blue-Green, they were able to achieve their objective, right? So essentially what you can see here is they were able to use path-based routing to access two different services through the same port on the ALB and also switch to canaries on the catalog service.

Thumbnail 1360

So let's look at this in a bit more detail, starting with the load balancer. What I'm going to do with this diagram, we'll come back to this diagram as we go through the talk. I'm going to go through a number of different ways of exposing services on ECS and show you how advanced deployments work with each of them.

Thumbnail 1390

So we're starting here with the load balancer. So when you do advanced deployments with ALB, you need to provide as a minimum a production listener rule, right? And optionally a test listener rule. And the way it works is essentially the weights on those listener routes are manipulated by the deployment controller to achieve the traffic shifting effects that we talked about previously.

Because ALB works at the listener rule level, it is possible to take full advantage of advanced request routing on the Application Load Balancer, which means you can still do path-based routing, you can do header-based routing, query string-based routing, all the features that you've come to expect with ALB can still be supported. You can also make the same service available on multiple listeners as well, and that will also work with advanced deployments.

Thumbnail 1440

In order to configure this, in your service configuration, in the load balancer block, you will add this advanced configuration block. And here you specify a second target group, so you now need a second target group in order for the advanced deployments to work. And then you specify a production listener rule ARN, optionally a test listener rule ARN, and importantly, also an IAM role, and the IAM role is needed to give ECS permission to manipulate the weights on those listener rules.

Note that there's no reference to a listener ARN and that means you are free to choose any listener rules you like, whether they are on the same port or not. From an ECS perspective, it doesn't make any difference, right? So ECS is agnostic to which ports those listener rules are actually connected to.

Thumbnail 1510

Thumbnail 1530

Thumbnail 1540

So let's walk through how this works, taking the example of Unicorn Watch UI. So they want to do a blue-green deployment, and they want a manual approval on the new version before they allow it to go into production. In the starting state, they have the existing service running in the blue environment, and all traffic is going to that. The first step is we do the scale-up. So we create the green service revision, and we go through the pre-scale-up, scale-up, and post-scale-up stages.

Thumbnail 1550

Thumbnail 1570

We then start to shift traffic. To shift traffic, we need to swap the weights around on the test listener, so we now have 100% of traffic going into the green environment. At this point, once the traffic shift is completed, we're going into post-test traffic shift, and in this case, Unicorn Watch have attached a hook. What this hook does is it monitors an approval parameter. Until this approval parameter is set to accept or decline, something somewhere needs to set that parameter in order to be picked up by the lifecycle hook to determine whether or not the deployment should proceed.

Note that ECS itself does not provide a UI to do manual approval or anything like that. That implementation is up to the implementer of the wider operations and how you orchestrate operations around deployments in your organization. In this example, we've just used an approval parameter stored in Parameter Store, and then there's a separate UI that can be used for an approver to go in and set that parameter. The way the post-test traffic shift hook works here is it checks that approval for the current state of that approval parameter. If it's still in a pending state, the hook returns with an in-progress status indicator, and ECS will then reinvoke the hook multiple times until a success or failure is returned.

Thumbnail 1640

Thumbnail 1660

Thumbnail 1680

Assuming then that we got the approval, we got a tick in the box, so that the hook is able to succeed, and we're now able to progress the deployment beyond the post-test traffic shift stage. So now we can move to production traffic shift. Here we swap the weights on the production rule, and we start our clock for bake time. During this time, we now have traffic flowing to the green environment, but with the blue still waiting just in case we need to roll back. And then finally, assuming we complete bake time without rolling back, the last stage, as I said, is cleanup, so that's where we scale down the old blue and the green becomes the new blue.

Thumbnail 1710

So that's how Unicorn Watch used the ALB in front of their UI service now using ECS blue-green. We haven't talked about the actual request routing. We've talked about the traffic shifting. Now, remember I said that they also wanted to expose the catalog on the same ALB port. So how do they want to achieve this? They want to do this with path-based routing. So what I set up, as you can see here, is a path-based route for catalog slash catalog star routing to the catalog service, another one for frontend star to the UI service.

Thumbnail 1760

They also needed a way to indicate test traffic. If you have test traffic for either of those services, how do you indicate that this is test traffic? And for that, they chose to use header-based routing. So if there is an X-BG-Test header, that signals that the traffic is test traffic, and they can do this because they can use advanced request routing with advanced deployments. So here we have, you can see the two rules for production, catalog production listener rule and UI production listener rule. The Application Load Balancer is configured with a path pattern matching catalog and frontend to route to each of those rules. And then on the far side there, you can see the deployment controller has been configured directly to manipulate those rules. In the case of catalog, they're using a canary deployment, and in the case of UI, using blue-green.

Thumbnail 1800

In order to route the test traffic, we have another two listener rules, and we've added the HTTP header rules checks matches onto those rules as well.

It's important to order your rules in the right order to get the desired effect.

Thumbnail 1820

Thumbnail 1840

Advanced Deployments with Service Connect: Performance Testing and Canary Validation

What about Service Connect? We've looked at the ALB. The catalog is also exposed internally via Service Connect, and we want that canary to work with Service Connect. So how does that work? With Service Connect, there are no listener rules and no weights in listener rules to manipulate. Instead, we have the Service Connect proxy sitting alongside the application container in the application's task, and that needs to route traffic to the right place. The way it works is you create your green revision, and both the blue and green revisions are registered with Cloud Map, with the green one being labeled as your test instance. All clients in the Service Connect namespace can see both the blue and the green revisions, but only requests matching the test rules are routed to the green revision.

For those of you who may not be familiar with Service Connect, in a nutshell, the way it works is you have your proxy container running alongside your application container in your task, and that is configured to route basically all network traffic. All network traffic passes through that proxy container, and it can be configured to route outbound traffic to the version that you require. The way it's set up by default is that if you set the header X-Amazon-ECS-blue-green-test, then the Service Connect proxy will automatically route that to the green revision. You can replace that with your own header routing rules. You can use any headers you like. You can use header presence, header value matching, or header pattern matching. So you can use things like agent strings or API revision numbers as needed in your use case.

Thumbnail 1950

So how did Unicorn use this? The catalog team was very concerned about performance, and they wanted to make sure they do performance tests on any new revision of the service before putting it into production. Basically, they implemented this using two things. They put a traffic generator in the Service Connect namespace, and this traffic generator generates load. All the requests are automatically marked with this test header. They then implemented a hook in the post-test traffic shift stage. What they do in this stage is they trigger the load generator to start sending test traffic. They measure the performance of the service under load, and then at the end of that test, they determine whether it's passed or not. If it's passed, the hook returns successfully and the deployment continues. Otherwise, it fails and triggers a rollback.

A key thing to note here is that if you are doing testing with Service Connect, your client has to be part of the Service Connect namespace. You can't test from outside. You have to do it from within Service Connect. In this case, they achieve that by deploying this traffic generator in the Service Connect namespace. So that gave them a performance test in the green environment before moving into the production traffic shift.

Thumbnail 2050

But even when going into production traffic shift, they wanted to use the canary to do some additional testing before allowing the full shift of production traffic into the new environment. Here, if you remember, I mentioned earlier that with canary, the production traffic shift hook would get invoked twice. So what they did here was on the first invocation, they start a test. At this point, a canary percentage of production traffic is going to the green environment, and we start this test to start monitoring the performance of the canary. That continues for the duration of the canary bake time.

At the end of the canary bake time, the production traffic shift hook is called for the second time, and at this point, the results of the monitoring are collected and analyzed to determine whether the canary has performed successfully or not. Once again, if the canary has performed successfully, we allow a return success and allow this production traffic shift to complete 100%. But if not, we can force a rollback by failing the hook. So this shows you that you can combine testing of the green environment, as we saw in the previous slide, with testing some of your traffic in actual production through the canary production traffic shift mechanism.

Thumbnail 2160

Headless Services: Blue-Green Deployments for Queue-Based Notification Systems

Okay, so we looked at the UI and the catalog so far. The next thing they looked at was this mechanism here where they're polling through an application load balancer for the status of orders that are underway. Now this doesn't scale very well, right? Essentially, you're having to poll for status updates, and obviously, the more orders you have going through, the less efficient this becomes. But also from a notification mechanism perspective, this was quite limiting as well. What they wanted to do was support different notification mechanisms like, you know, if people wanted to receive an email or an SMS on their phone or whatever notification mechanism, they wanted to support a wide range and they couldn't do it with this architecture.

What they wanted to do is switch to a more asynchronous approach where orders would post events on a queue and then that queue would be monitored by a notification service that would pull events off the queue and then send notifications using a variety of mechanisms as configured for each user. And the problem is they couldn't do that easily because if they did that, they would need a blue-green deployment in order to be able to switch from one version to another in one step. And they couldn't do that with CodeDeploy, because this service wouldn't have a load balancer in front of it if it was pulling messages from a queue.

Thumbnail 2250

So that's what they wanted to do. There's no load balancer in front of the notification service. This is what we call a headless service, right? There's no requests actually being sent to the notification service. Rather, it's the notification service pulling messages off the queue. And with ECS blue-green, again, you can support this pattern, so you can support headless services where there's no service connect, there's no load balancer, you just have a task or a service pulling messages off a queue. So this is the next pattern we're going to look at, headless service.

Thumbnail 2290

Now, with headless service, because there's no requests being sent to the service, there's no traffic to shift. So you might ask, how does this make sense for advanced deployments if there's no traffic to shift? It only makes sense in the sense that you still benefit from having a blue and a green environment running in parallel for a while, so that if the green doesn't work well and you need to roll back quickly, you can still roll back very quickly to the blue environment. However, because there is no request traffic, you have to think a bit differently about how you manage the activation of these services.

So essentially, you need to do a bit of extra work to determine which version of the service is picking messages off the queue. And you need some way of turning on or off the actual services pulling messages off the queue. So let's see how it works for Unicorn Watch.

Thumbnail 2350

Thumbnail 2360

So in the initial state, they have the existing blue revision pulling messages off the queue and processing those messages. They now deploy a new version. When they deploy the new version, they deploy it in a deactivated state. So the tasks are created, but there's some parameter, perhaps in parameter store, for example, which is controlling whether or not this thing should pull things off the queue or not. So it starts in a deactivated state saying, don't pull messages off the queue.

Thumbnail 2390

Once this is scaled up, we can skip the usual test traffic shift, we can just go straight through that, go straight to production traffic shift. And at this point, when we enter this stage, we can use a hook to disable the blue revision and enable the green revision. So at this point, it's now the green revision that's picking up messages from the queue and processing them.

Thumbnail 2420

Thumbnail 2430

Thumbnail 2440

Thumbnail 2450

At this point, we start to monitor the behavior and the performance of the green revision and make sure that it is functioning correctly and processing messages correctly. So we monitor in this case using CloudWatch to monitor the queue. If monitoring is successful, we succeeded, and we can get rid of the blue revision after the bake time. Otherwise, if unsuccessful, we disable green and restore the blue, reactivate the blue to go back to where they were previously. So that's how they improved their ability to notify customers on status updates for orders.

Thumbnail 2490

Thumbnail 2500

Video Streaming Service: Blue-Green Deployments with Network Load Balancer

The last bit was the video streaming service itself, and here they wanted the ability to deploy new versions of their proprietary streaming protocol with a lot of new security features, and they needed to do it in such a way that they flipped completely from one version to another, because they couldn't guarantee compatibility between the new version and the old version. So they wanted blue-green on the video streaming service. And to do this, they used advanced deployments with a Network Load Balancer. With Network Load Balancer, it's very similar in many ways to what we saw with Application Load Balancer, but there are some differences you need to be aware of.

First, with Network Load Balancer at the moment you can only do blue-green, you cannot do canary or linear. Second, you don't have the benefits of layer 7 request routing, so because you're working at layer 4, you can't do path-based routing. That means you have to use different ports for your production and test, so you need to have separate ports for production and test. And also, the test listener is not optional in the case of Network Load Balancer. You must have a test listener. The other thing to be aware of is that for some stages, there's an extra 10 minute delay added on, and this is to do with some of the internal timing issues within Network Load Balancer to make sure the routing works correctly. But otherwise, it's quite similar to Application Load Balancer.

Thumbnail 2570

So, just to recap, we saw here how Unicorn Watch were able to use advanced deployments in a number of different areas. And what we've done essentially is cover four different service exposure methods. We've looked at Application Load Balancer, we've looked at Service Connect, we've looked at headless services, and we've looked at Network Load Balancer. And the beauty of advanced deployments is you can use advanced deployments with all of these service exposure patterns. And with that, I'll hand back to Kevin. He's going to take you through some of the practicalities around migration and how you choose your deployment strategy.

Thumbnail 2620

Choosing and Migrating Between Deployment Strategies: Practical Considerations

Alright Mike, thank you. It was pretty exciting, right? So I went through a little blue-green deployment of my own, and I got a surprise for you Mike with, worked with the Unicorn Watch folks to get a new update on their streaming. So let's go ahead and see how that works. It worked on my machine, so I'm not sure what's going on here, but luckily we're using blue-green deployment so we can quickly roll back if you will. Hey, that looks better. All right.

Thumbnail 2640

So let's go through a couple of the things of choosing and also migrating between strategies. So first changing them is really easy. You can just change the strategy in the deployment configuration as Mike talked about. The key is adding that advanced configuration if you're using a load balancer. So once you've added that advanced configuration, then you can actually go to blue-green back to rolling to canary to linear. It's very fluid. The key is that once you've gone into advanced deployments, we ask that you just leave that advanced configuration there forever for as long as you're going to use it because we could be using either one of the target groups and so if you're doing a rolling we need to have both so that we can determine, hey, this one's serving traffic or this one's serving traffic to do the rolling correctly. So again add advanced configuration and then you're free to move back and forth.

Thumbnail 2690

Also, if you're, you have kind of two different ways if you're wanting to migrate from CodeDeploy to ECS deployment strategies. You can either do just an update service, so update in place

and just move from both the deployment controller as well as the deployment configuration to what you want it to be. Again, you'll need to add the advanced configuration blocks and the load balancers all together because if you're using CodeDeploy, the load balancer configuration is actually in CodeDeploy land. The other option is basically to make a replacement service and do that migration more controlled in your way. So if you want to test out the hooks, you want to test out the process a little bit, you can just create a second service and then migrate.

Thumbnail 2740

Some of the considerations: So we talked at the beginning about how your tasks are scaling up and down and then also the speed of operation. Your biggest trade-off from advanced configurations and default rolling is the speed of rollback. You always have that blue version fully scaled and ready to go, so if for whatever reason your service needs to scale up during the middle of deployment, both of them will scale up to that capacity and it's always there. The speed of rollback is much higher in that scenario.

Choosing between your advanced deployment types is really just about what are you trying to achieve. If you want only one service revision ever serving traffic at once, blue-green is perfect. It makes sure that all traffic is served on the blue or the green. That could be sometimes, you know, if you have a very chatty client, maybe a web app or something that's making calls back and forth, and so to make sure they always get a consistent experience, you want to use blue-green.

Canary, again we talked through that a little bit. Some of that is if you can't test in production, maybe it's a regulated industry or some other reason that you cannot test in production. Canary is a great way to introduce it to a small section of your customer base before introducing it to everybody. And then finally, linear is a great option again if you're interested in seeing how it scales under increased load over time or other concerns along those lines.

Thumbnail 2850

Again, one of the amazing things that you can do with advanced deployments is this test traffic. You have a means where you can test the actual running code in your production environment before introducing it to any of your customers and potentially preventing an outage that way. Again, as we talked about, you can migrate fluidly between all of the advanced deployments or between advanced deployments and rolling. Again, you can migrate CodeDeploy to advanced deployments, and one of the reasons that you might want to do that is to take advantage of some of those advanced feature sets that you can use with Amazon ECS that you couldn't before.

Thumbnail 2890

Mike talked through a couple of those, multi-target groups sometime, if you have multiple ALBs you're trying to connect to, as well as using Service Connect or some other features like that. Finally, in order to continue your journey, we have a landing page here with a bunch of relevant links, as well as the actual deck that we went through today. Also, there is a related session this afternoon at four in the MGM covering deployment pipelines with Amazon ECS. So if you want to hop over there, that'd be great.

At this time, I just want to say thank you very much for coming, and I will be around here with Mike if you have any questions afterwards. Thank you.


; This article is entirely auto-generated using Amazon Bedrock.

Top comments (0)