Kazuya

Posted on Dec 5, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - Deep dive: The evolution of AWS load balancing and new capabilities (NET334)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Deep dive: The evolution of AWS load balancing and new capabilities (NET334)

In this video, AWS experts explore the evolution of load balancing, starting with Matt Lehwess explaining the Nitro system architecture that powers ELB performance and Hyperplane technology enabling massive scalability. Jamie demonstrates how ALB, NLB, and Gateway Load Balancer fit into three-tier web architectures, covering features like weighted target groups, Automatic Target Weighting, PrivateLink for cross-VPC connectivity, and EKS integration via the AWS Load Balancer Controller. Milind Kulkarni presents four recently launched features: NLB's QUIC pass-through support for reduced connection latency with mobile clients, weighted target groups on NLB, ALB's Target Optimizer for AI workloads requiring single-task concurrency, and URL/host header rewrite capabilities using regex. The session includes practical architecture patterns for e-commerce, gaming, financial services, and security implementations using Gateway Load Balancer with partner firewalls.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction: Evolution of AWS Load Balancing and Session Overview

Welcome everybody to our session. We're diving deep into the evolution of AWS load balancing and new capabilities. I'm going to go over the agenda with you and introduce my colleagues here. First up will be Matt Lehwess, our Senior Principal Solutions Architect, and he's going to take you deep into Nitro and all of the things that power our ELBs. He's going to explain why they're so fast, how they perform, and how they scale. Then I'm going to take you through where they fit. We'll be concentrating on ALB, NLB, and Gateway Load Balancer, and I'll start with a simple three-tiered web app architecture, and I'll show you where they fit into those places for security, enhanced speed, and of course Layer 7 processing. And then lastly, Milind Kulkarni, the Principal Product Manager of NLB and Gateway Load Balancer and some ALB, will be taking you through some of the new features that have been released recently. So I'm going to hand it over to Matt.

All right, thanks, Jamie. Exciting session. Thanks everyone for making the trip out here to Mandalay Bay. If anyone takes any photos, feel free to find us on LinkedIn and share them. We always love that. I'll be talking about the architecture behind modern load balancing. Not specifically AWS, but AWS is one part of it. AWS load balancing is basically made up of these three products: the Application Load Balancer, Network Load Balancer, and Gateway Load Balancer, and we'll dive into each of those.

From Classic Load Balancer to On-Premises Architecture: The Foundation

To talk about how those architectures came about, we need to take a step back and talk about what we call our Classic Load Balancer, which is the first form of load balancing on AWS, so we called it our Elastic Load Balancer. When I first started working on AWS back in about 2013, Elastic Load Balancing was kind of the aha moment for me. It was about elasticity, about how you deploy many instances in a fleet, thousands of instances. You put a load balancer in front of it. Before that, I was working on physical hardware in data centers, building big metal boxes that were load balancers that had virtual IPs, and we'll talk a little bit about that when we dive into on-premises load balancing.

Let's talk about the functions you need to build a load balancer. First, let's talk about on-premises. Basically you've got three components here: you've got the central load balancer and then targets. Traffic comes into a central point and then it gets shared across the multiple destinations. In a typical on-premises environment, the on-premises load balancer has things like ASIC-based packet processing, so it's highly performing. It's generally a lot of functions put into one device, one piece of metal that sits inside a rack. There's typically one IP destination called a virtual IP, which is the single destination for traffic. You have a DNS request, it resolves to that VIP or virtual IP. The load balancer then spreads load across those three targets, and a load balancer will do things like security policy and routing policy.

One of the more famous load balancer vendors has a pretty robust way of doing TCL scripts, basically you can define how traffic is routed through this central appliance. The reason why you need this appliance in front of targets is that the targets are typically just commodity servers running a typical operating system like Linux or something similar, generally running some kind of application like HA Proxy or Apache. They're typically CPU and memory bound, so there's only a certain amount of CPU and memory you can fit inside that target. You need to spread load across those many targets, and we'll talk about how that looks in AWS.

Typically, redundancy is done through things like TCP session synchronization across multiple physical devices. What we've got here is an active-standby pair, where we're sharing the TCP session data from an on-premises physical single box to a secondary box, so that if that physical first box fails, the second can immediately take up the load and send traffic to those same targets, which hopefully are still online. Again, all physical stuff in physical data centers.

There are some downsides to that. Basically what you're seeing here is a rack, it's a rack I drew, so there's a bit of artistic freedom there, but basically you're deploying physical servers in racks of compute next to your load balancers, and then you're scaling those racks. I used to do this. I used to crawl around data centers and deploy new racks of commodity load balancers and servers that would run our applications, and we would look at the load on these servers and say, okay, we need to deploy another couple of racks in another couple of months. Eventually you get to a point, and I did get to this point in my previous careers, of building whole data centers worth of these kinds of components. Now the good news is here at AWS we've already built the data centers.

AWS Infrastructure and Classic Load Balancer: Moving Beyond Physical Hardware

We have 38 regions worldwide. I don't know exactly how many data centers make up those regions, but basically we have these components where you've got the data center that are combined into an Availability Zone, and Availability Zones are combined into regions. Then we have this magical, amazing thing called a VPC. The VPC, or Virtual Private Cloud, is a logical construct that allows you to deploy in any one of those physical pieces of compute. And then we bring in EC2, which is a VM or a virtual instance that runs on a physical server in a physical data center that Amazon manages.

So what a typical architecture looks like is something like this. You've got EC2 instances sitting inside subnets. You might have something like a network gateway or a network firewall. These are all pretty standard functions that run inside AWS inside the VPC. Now, let's go back to our web app. We've got multiple EC2 instances because software is bound, CPU and memory are controlled in a single VM space. Also, multiple VMs across multiple subnets, across multiple Availability Zones. That's how we define whether you're in one Availability Zone or another. Each of these instances gets an IP address in different subnets across AZs. And you've got a web application. Great.

Now, what folks were doing back in 2013, prior to Elastic Load Balancing, was using DNS. DNS is actually quite a great load balancing mechanism. What you basically do is have a single DNS host name which resolves to multiple A records, and those A records each represent one of the IP addresses of your web application. Now, this isn't true load balancing in the sense that you're not spreading the load based on a per packet or 5-tuple basis. Milind is going to talk about 5-tuple and why it's important, but you're spreading the load based on a DNS query. The response is an A record, the client chooses that A record, and that's used as a destination. There are some downsides to just purely using DNS here. That said, we use DNS internally for a lot of stuff, and I'll talk about that.

You can also deploy instance-based load balancers in AWS, and a lot of customers do this, particularly on products like our AWS Outpost products that don't have our Network Load Balancer and Gateway Load Balancer, which I'll talk about. Here we've got instance-based load balancers, and again we've got our A records, or you could have a single public IP Elastic IP. At the internet gateway, you can monitor your second load balancer, first load balancer, and shift that Elastic IP assignment. This is a common architecture to do active standby for instance-based load balancers in AWS. So what you're basically doing is saying, I've got a single load balancer on an instance. It's quite a big instance, 16 X Large or 32 X Large, so you can handle the amount of traffic you need. You're shifting an Elastic IP from one instance to another across Availability Zones. This is an interesting architecture because the Elastic IP is actually not AZ bound.

Now, we start talking about Classic Load Balancer. This is where we basically said, okay, these architectures are cool, but how about we build this as a service and offer it to customers so that customers don't need to do the heavy lifting there and build instances with load balancers, et cetera. Now, the truth is, I talked about DNS. We're actually using DNS on the front end. We have two endpoints, one in each Availability Zone, that constitute the Classic Load Balancer or Elastic Load Balancer service. The DNS name or the alias record for that Classic Load Balancer is resolving to each of these two A records. So we're using DNS for that front end again and having instances inside our account, which I'll talk to in a second, doing the load balancing.

If you had an internet gateway again, it didn't have public IPs and you'd have this Elastic IP map. Pretty straightforward. What's actually happening under the hood is these Elastic IPs are a little bit special. They're a cross-VPC ENI attachment, which you can now do as a customer inside your own account. But we have an Amazon Service VPC, and we drop those ENIs from instances inside that service VPC into your VPC. So we actually operate standard EC2 instances for our Classic Load Balancer service inside our VPC, and your traffic is then hitting those instances and being load balanced across the targets that you see in the bottom of the diagram here, across the web instances.

One of the things we can do here is we can scale up those instances, we can scale them down, and you don't even notice. That's where we used to say you'd have to do things like pre-warming a CLB if you thought there was going to be a large peak in traffic, if you had a big event going on. That goes away with NLB and Gateway Load Balancer. I'll talk about that. But you would actually see if we scaled out, because with the Classic Load Balancer, what we would do is add multiple A records in the alias record or the CNAME for classic load balancer.

Here we've actually added two more in each availability zone because the traffic dictated that, and we add another two more A records for each AZ, so four more total. You would actually see that happen. There were some typical architectures where you would build what's called an ELB sandwich, where you'd have a set of appliances with an ELB in the front and you had someone like a security firewall vendor monitoring this DNS name every second to see if the CLB scaled up or down, so they could send traffic to all of the nodes. There's some interesting stuff that would happen there with the classic load balancer.

AWS Nitro System: The Technology Powering Modern Load Balancing

Let's move on to modern day load balancing in AWS now. We have ALB, NLB, and Gateway Load Balancer, and we'll start with ALB. ALB has a very similar architecture to CLB, but there are a couple of key differences. The main one being AWS Nitro. Our AWS Nitro system is where we basically took what we called a Zen-based hypervisor that did a lot of the network operations in software. We allocated CPU cores on a physical server to that function, and we moved that down onto the network adapter. The network adapter itself, which is part of the Nitro system, now does those operations on a CPU that sits on the network card itself. Systems like ALB are using Nitro to get more performance than what they would have previously. Think about the software-bound processes versus hardware-bound processes.

This is a quick diagram of what the Nitro system looks like. We've got the Nitro core itself. There's a very small KVM as the hypervisor firmware now, and we have a Nitro security chip which basically controls the security of the Nitro system and makes sure that the hypervisor firmware is actually what it should be. Think about if it was on Outposts, the same technology sitting on premises. Could someone put firmware in there? No, there's actually the Nitro security chip that has that check and balance to prevent that.

Our Nitro-enabled instances started in 2013, and our bandwidth per instance has gone up quite considerably. We're up to 50 gigabits per second in 2021. As we step through our different versions of our Nitro chip on our network adapters, you can see here our C8gn can do 400 gigabits per second per network adapter. That's enabled us to scale for things like AI/ML workloads up to 12.8 terabits per second per EC2 instance. We basically stack these Nitro cards next to each other and can get that kind of performance. ALB and NLB aren't specifically using those large instance types because we don't need it, but you can see what Nitro really brings.

Nitro has encryption by default as well, so always-on encryption for version three and above. It's basically end-to-end in-transit protection for your traffic, and you don't need to change anything on your application to enable that. We recently, about a week ago, released VPC encryption controls, which is an account-wide VPC enforcement mechanism to say I only want to use these new versions of Nitro so I have in-transit encryption for all of these instances. What will actually happen here is ALB, NLB, and things like Firewall Manager will auto-upgrade or auto-migrate to these newer families of Nitro to support VPC encryption. When you tick that you want encryption within your VPC, we'll go and upgrade your ALB and so forth to those new versions of Nitro that support in-transit encryption.

Hyperplane Architecture: Scaling Network Load Balancer and Gateway Load Balancer

Let's move on to NLB here. NLB is a very interesting one because we fixed one of the key problems that I was talking about. Hyperplane is basically the solution to the DNS thing that we were seeing earlier. What we do now is, regardless of the size of the fleet inside the service VPC, we give you one IP address, one ENI per availability zone. That means we can scale up and scale down that fleet as much as we want, and it has no effect on what you see within your VPC. Hyperplane is basically an umbrella term for many services that operate within that space, but it's a massively scalable infrastructure that is just presented to you as one ENI per availability zone. We can scale up and scale down and you see no change there.

Gateway Load Balancer is also built on Hyperplane. It's a little bit of a different load balancer. I think about it as a Layer 3, Layer 4 load balancer where basically it's built for appliances, so you've got traffic coming in, an internet gateway hitting a gateway load balancer endpoint, and then hitting a gateway load balancer in another VPC.

This architecture is useful if you want to offer appliances in a central security VPC. You can deploy those endpoints in many VPCs and offer that same security fleet as a shared or multi-tenant service. Some of our firewall partners actually offer this as a service as well. You can drop a gateway load balancer endpoint in your VPC and use Palo Alto or Checkpoint, for example.

One of the key callouts with this architecture is that we still have Hyperplane for the gateway load balancer endpoint, but also for the gateway load balancer itself. You are actually going through two Hyperplane fleets to achieve this gateway load balancer architecture, and Jamie is going to dive into what those architectures actually look like in practice.

Now, lastly on Hyperplane, it is an internal load balancing service. It is not something that is public that you need to worry about. When you look at NLB, gateway load balancer, and so on, know that it is using Hyperplane as the scaling mechanism under the hood. It is built using standard regular EC2 instances, but it is built for immense scale and multi-tenancy.

We have a bunch of services using Hyperplane, and there are probably more by the time you read this than when I wrote this last night, because there are new services being released all the time. We just released one in preview, our proxy service, which actually uses Hyperplane as well. Hyperplane is an inherent function that is built into a lot of the new services that we deploy. I am going to hand it over to Jamie to talk about load balancing in action.

Application Load Balancer in Action: Layer 7 Processing and Three-Tier Web Applications

I will be concentrating on ALB, NLB, and Gateway Load Balancer, which we lovingly call GWLB. I want to set the stage right away by saying that I will be doing things through an architecture. The first thing we are going to start with is L7 processing, and the architecture that I am choosing is a three-tiered web app, which will give you all of the tools and information about how the application load balancer runs.

Let us take a look at some of the features that it has. This is an ALB at a glance. I have our targets, and these are the targets that the ALB can route to: instances, Lambda, containers, and IP addresses, which is pretty cool. Here are some of the features, and I say some of the features because if I listed all of them, two things would happen. One, this slide would be a mess, and two, Melinda would have nothing to talk about. So we are going to leave some of the newer released features for Melinda to speak on.

A couple of things I want to point out here is we have WAF integration. We have authentication offload. We have, of course, SSL, TLS, and MTLS with pass-through as well as verified mode. We have something called slow start, which is pretty cool. If you anticipate a bunch of people coming to your site at a given time, you can set a slow start timer to slowly ramp things up so your backend does not fall over. There are all sorts of cool features.

Let us take a look at the architecture we are going to be talking about. This is a standard three-tier web app. I have a transit gateway in the middle connecting my backend to my frontend. I have my ALB in the front. I am using security groups at this point. I have a fleet of Nginx servers, let us say EC2 instances. It goes to my backend. My databases are read databases, or rather, they are clusters, so we are going to basically be interacting with the read part. Of course, I have some Kubernetes clusters and some Lambda and other services.

Before we get into working with the architecture, the one thing I want to talk about is an unsung hero for ELB, and that is the target groups. Each of our ELBs requires you to configure a target group because you have to put your targets in something to send traffic to. Let us take a look at that. The first thing you are going to get when you want to create an ALB, ELB, or NLB is what type of target group you have.

Now, how many of you here are using ELBs? Good, most of you. So you get a choice. You get instance-based, IP-based, Lambda function, and then of course you get the application load balancer. At the bottom, you can see, and this is exactly right off of the console, it tells you which target group can fit for which load balancer. We also have a couple of other options like protocols. You will see some newer protocols in here, which again Melinda will talk about, but all the standard options are there. It also lets you know which ones you can assign to your load balancing experience and then of course the versions that you can use for protocols, so you can set all of that stuff up.

We also have health checking. Now I am going to get on my soapbox a little bit for health checking. How many of you use TCP 80 to health check? Not one hand. Perfect. I did ask this question a couple of re:Invents ago and I had half the room put their hand up, so thank you. I am going to preach to the choir here a little bit and say make sure that you are actually using the URL for

the status page instead of just TCP 80 because you can get into a gray failure with your application. Your port could be open, but traffic can still be sent through. Here we have a bunch of advanced topics and advanced features that you can set for your load balancing as well. We also have options for other things like different attributes such as your targets and draining. If you want to do minimum healthy failovers, we have an option so that you can say, "If I lose 20%, half, or whatever percentage you set of my targets in my target group, fail it away, or I can fail it open just to make sure that I can handle whatever I can if there's an issue and then go ahead and take care of it."

We also don't want to forget the fact that target groups also let us do auto scaling. So let's dive into a couple of advanced options that you can do. One of them is weighted target groups. I can have one target group going on just accepting my traffic, and let's say I want to build a canary for something new. I can just set my weight so 95% of my traffic goes to my main app and then I can canary my other with 5%. So you can do things for deployments, changes, and blue-green deployments. It's got multiple applications as I'm sure you can figure out.

But how about algorithms? We've got a couple of options here. Round robin, which is your standard load balancing, just goes right after the other. We also have least outstanding requests, and that's basically, "Who has the most requests? Let's not send it to them. Let's send it to the targets that have the least requests right now." And we have weighted target groups, and then we also have our slow start again that I mentioned. But I have the weighted target groups kind of selected, and this is a cool feature. It's something that we can do that we call ATW or Automatic Target Weighting.

Let me give you an example of how this works. So we have a bunch of targets, and they're working just fine. We've got traffic coming in, and ATW or the automatic target weighting, if you select those two pieces, look at three different things. They look at 5XX errors, TCP connection failures, as well as TLS connection errors. And throughout that we run an algorithm or basically run it against our peers to see if there's any anomalies. If an anomaly is detected, we don't just pull it out, right, because this could be a temporary situation. But what we won't do is we won't send it any new requests. It'll just handle the requests it has. Now either you or the system sorts things out, and when it does, the system recognizes that, again, constantly running that comparison and says, "OK, it's fine. We can send targets. We can send our traffic to our different targets."

Now we don't hide all these metrics from you. If you go into CloudWatch, you can go ahead and look at these different pieces here to see exactly the counts and how automatic target weighting is being sorted out. Let's do a couple of callouts for features that make life a little bit easier. You may be using some of these yourself. For ALB we have something pretty cool: WAF integration. Now WAF doesn't live inside of the ALB. The ALB actually connects with it on the back end, and WAF allows us to do a couple of things, of course. You can use WAF for getting Shield Advanced, right? So if you have a DDoS attack or something's going on right away, and you have Shield Advanced with WAF on ALB, you can kind of pick up the bat phone, if you will, and say, "Hey, AWS, help me out," and they'll go in and help you edit your rules, sometimes even on the fly.

We also like to go in and remove any known bad actors to keep them from getting to your infrastructure. Another piece we have is the authentication offloading. So instead of having to put all this stuff into your application, you can go ahead and integrate with Cognito. Cognito gives you a couple of options. You can do user pools using IAM, right? We're all familiar with it because we use IAM to log in. If you use any advanced features such as VPC Lattice or anything like that, you're using IAM for your policies. No difference there. You can go ahead and use it in your user pools or SAML connections or OIDC providers like Auth0, all of those folks can be handled on your Application Load Balancer in the front end.

So where does it shine, ALB? ALB shines in a few places, not everywhere. I'm not going to tell you to use an ALB for all of your workloads because it wouldn't be a very good part of the presentation if I'm telling you where the best place to put these load balancers. But e-commerce and retail, if you have an experience of a checkout, and you have your shopping cart or you also have your catalog, and you want to route that traffic because we saw that we can do any type of host-based or path-based routing, you have good news. Publishing, if you're releasing a new story, we saw how ALB can elastically expand and contract for the scale of what you need. Social media platforms, you've got your movies and your stories and all that fun stuff.

Network Load Balancer: Connection-Based Load Balancing for High Performance

We're HIPAA compliant and PCI compliant, so it's a good fit for healthcare as well as finance and of course government for any critical applications. A lot of governments use ALB. So now we've looked at that, let's take a look at our Network Load Balancer. Now, I like to look at the Network Load Balancer as our connection load balancer.

It helps us connect different parts of our workloads together and also serves as a connection mechanism to other things like VPCs and hybrid environments. Instances, ALB, containers, and IP are all supported targets. One interesting thing about IP targets is that they can be on another service as long as you can route to that IP, you can use it in your target group. We mentioned we had read databases, so we're going to use our Network Load Balancer for our read databases.

Now I have my own VPC. I used to be a database person, but I'm a networking person now, and we're both paranoid. Those are the two main things that if they go down—the network or the database—usually everyone feels it and you have tons of pressure. You're not like, you know, at your kid's birthday party and you're getting phone calls. So they want to be sure that we're going to make things a lot easier and a lot quicker, and then they don't have to worry about anyone messing around with their stuff. So they're putting their databases in their own VPC.

Why is this good? If we're looking at NLB, it has TCP-based connections. Remember, they have long-lived connections as well. These connections will not go away unless you stop sending traffic and your idle timeout expires. The flow hash algorithm distributes our connections to our databases evenly. So instead of hotspotting one read database versus the other, now I can be sure to spread the load out from the rest of my workload. NLB has high throughput. Being built on Hyperplane, it can handle a tremendous amount of traffic as well as a tremendous amount of spikes. If your traffic is extremely spiky, NLB is really good for this. And of course, there's the ultra-low latency.

So how can I use my NLB to increase my speed, lower my latency, and also increase my security? Well, one of the things that I love about NLBs is PrivateLink. You'll notice that I've also removed my connection from the transit gateway to this VPC. I really don't need it if this is my paradigm for how I'm going to my read databases. Now we can imagine the rest of this infrastructure with other connections populating the databases, but we're just talking about the reads in the front end. I'm using PrivateLink here. The reason why this works is that PrivateLink takes the local IP address of the subnet that an endpoint is sitting on, so it looks like your machines or your workload is just talking to something else on their subnet.

You can go up to 100 gigabits. Actually, you can go a little farther, but when you do, we start talking to you about doing more advanced things like sharding your NLBs. PrivateLink supports TCP and UDP with unidirectional stateful flows, so I know that no one is going to be able to reach into anything else other than what I want. It goes in one direction and it responds to that one direction. And of course, it's all private. So this connection between these two VPCs is happening over the AWS backbone, and there's no internet access whatsoever. PrivateLink inherits all the goodness of NLB.

We get questions a lot about cross-account access. Let's say our database folks are even more paranoid and they want to have their own account, their own limits, their own payers, and all that stuff. What we've just set up will still work because PrivateLink with NLB will go across town. It'll go across regions. Now we've got a new region with some container workloads in there and we want to give them access to our read databases. This isn't as worrisome for performance as we would like, but it's still quite performant. All this is going over the AWS backbone, all of our own fiber that we had laid out, so we don't have to worry about going to the internet or dealing with what we call internet weather.

One thing that we see here is that I have a bunch of containers as well as an EKS system. Where does ELB help there? Well, we always recommend the load balancer controller. The load balancer controller for ELB allows you to configure your ALB or NLB with the configuration of EKS, so you don't even have to become a master of these two load balancers. All you have to do is just know which commands you need to put in and which configuration you need to put into EKS to get this to work. So what we're going to do is just show our three nodes with our pods and our ports for our pods, and then we're going to use an ALB ingress controller.

We can use an NLB as well, depending on whether we want to go more performant or not, but we're going to do a bit more path-based routing and handle things at Layer 7. But that doesn't restrict us only from using ALB. As your application scales out or if you add more containers, you can go ahead and add multiple paths, and it'll go to the ports as you need with IP preservation and all of that. But let's say you need something a bit more performant with low latency and all the goodness that I mentioned of NLB. You can go ahead and do something called Direct Pod.

What Direct Pod gives you is the ability to just take those IP addresses and add them to your target group, and again, you configure all of this within your Kubernetes configuration. So if you need more direct pods, you can go ahead and do that. Remember though, you do not want to get into IP exhaustion and do everything direct pod because then you can end up with a bit of a mess.

That's why we have two different options. There's also a lot of work being done on the AWS Load Balancer Controller for many new features we're developing, so I highly encourage you to take a look at this.

I mentioned earlier that NLB does hybrid connectivity. What does this look like? I said that we have IP addresses as an option for our NLB, and sure enough, here we are. I'm using these IP addresses now. Granted, the target group does not actually live in the data center. I'm just doing this to show more of a logical representation of how this is done, but it does live in the VPC that the NLB is in. We've grabbed a couple of IPs from on-premises because our Kubernetes clusters need to reference something on-premises. We have a Direct Connect connection to our transit gateway. Metal works just fine, so I'm using that to connect up to those pieces.

Where does NLB shine? First, I am a gamer and I love the fact that our gaming customers use NLB because it helps keep the ping times quite low, and the connections are long-lasting as long as you send traffic through them. Financial services is also great when you need to have low latency performance. Think of ticker tapes or transactions that need to be done in a very quick amount of time. The same applies to ad exchanges. IoT is another great use case for IP sending in TCP configurations from the field, like your Samsung refrigerator or perhaps a GE toaster. And of course, media and streaming—when we want to sit down and watch a movie at home, we don't want that movie to buffer or lag. We want it to perform and play just like we're in a movie theater.

Gateway Load Balancer: Security-Focused Architecture with Bump-in-the-Wire Deployment

The last load balancer we're going to talk about is the Gateway Load Balancer, and as Matt mentioned, it's more for our security systems. If we look at it at a glance, we have instances and IP that you can use. We use it a lot with bump-in-the-wire deployments, and these are not all the features you can do, just some of the more important ones I want to bring up. You can do custom health checks with three different couples: port 53 and port 2, so source port, destination port, source, destination, and all of that. Matt also mentioned that it uses Geneve encapsulation. When you're sending packets in and out to the targets of a Gateway Load Balancer, that packet is encapsulated by Geneve. When it is stripped off by the security devices, it still retains the original source and destination as if it never even knew that it was actually intercepted and sent to be inspected.

Let's add this into our architecture. First, I have to slide us back over to our front door. We've kind of outgrown the security groups. We have a lot more traffic, and we want to take advantage of some of these partner firewalls that you can find in our marketplace that work with our Gateway Load Balancer. We're going to slot that in right at our public subnets right at our front door. In order for that bump-in-the-wire to work, we're going to have to add some routes we call ingress routing to our IGW. We're basically going to say if you're going to talk to any of our targets, make sure you go through the Gateway Load Balancer first. That fleet, remember, as Matt mentioned, has the ability to expand and contract with auto scaling, so it will work well with our workload.

But once again, our database folks are ever paranoid, and they want to actually have some inspection from any traffic going from our back-end VPC into their databases. If you make a little bit of room over here, we can accommodate this again by using those endpoints. Now the Gateway Load Balancer endpoints are just like PrivateLink in the way that you can put them all over your workloads, and they will be the main way that you gain entry into your Gateway Load Balancer and into the traffic. I also caution you to make sure that when you're doing something like this, if you're centralizing this VPC, which again I want to point out is not connected to anything else, so my infosec group is very happy. We have a lot to work with, and we can expand and make sure those rules are understanding of the traffic patterns that things come and go in.

Where does Gateway Load Balancer shine? Well, it's a security device, so it should shine everywhere, right? All of those places that I mentioned, a lot of customers in all those fields take advantage of the Gateway Load Balancer. So now we've kind of built out our architecture. Let's take a look at the bigger picture of what we built. If we slide in here, first, we have our ALB that we used for the front door. We're using Automatic Target Weights, right, to make sure we're getting the most out of all of our targets. Let's say those targets are very big and expensive machines that cost a lot hourly. We want to make sure that there's nothing wrong and nothing's going idle, so we're using ATW there. And then next, we used ALB for our ingress controller for our containers, and we're using NLB for our direct pod deployments for any of the things in our containers that require low latency and fast connections. We also used NLB as a connection piece. As I promised, I didn't use anything else to connect up our VPCs. I used the NLB.

Lastly, we used our Gateway Load Balancer, which is our load balancer of choice for security. Our database VPC as well as our front door are both protected by this device. We don't need to repurchase those firewalls repeatedly. We can leverage them in the same way and ring fence the VPC where those firewalls live so that it has no other connectivity other than the Gateway Load Balancer endpoints coming in.

QUIC Pass-Through on Network Load Balancer: Reducing Latency for Mobile Workloads

Now, as promised, I'm going to hand over to Melyn, who will tell you about all the cool things coming for ALB and NLB. Thank you, Jamie. Before I start, I want to do a quick checkpoint. We started with Matt talking about the cool technology that goes underneath in building all the systems, the underlying hyperplane architecture, the Nitro systems, and all the cool parts. Then Jamie talked about the bigger architecture, where these pieces fit, and how the load balancers we offer actually serve your needs. Now that we've looked at the bigger picture, I'm going to talk about specific things on ALB and NLB.

We have launched many capabilities in the last few weeks. I'm going to talk about two key capabilities we launched on the Network Load Balancer, or NLB. After that, I'll also talk about two key features we launched on the Application Load Balancer. Let's get started. On NLB, we launched something called QUIC pass-through support, which is a feature we recently launched. But before we start there, let's understand the basics first.

Some of you raised your hands when Jamie asked the question about how many of you use ELBs. NLB uses an algorithm called a 5-tuple hash to select a target. But what is a hash? A hash is essentially a mathematical function that takes a number of inputs and produces a single output. For example, this is a mathematical function with five different fields from an incoming packet or incoming request. The protocol is UDP, which is the first field. The second is the source IP address. There is a source port. The destination IP is that of the NLB. And lastly, the destination port. These are the five tuples or five entities.

You feed these five entities into this mathematical function, and then you get one single output. Let's say the output is two. So the NLB says, I'm going to pick up target number two. Whatever incoming traffic comes in for this combination of five tuples, I'm going to send it to target number two. That's how NLB routes traffic today. This is one key construct that we'll use later on. The key points here are that as the number of targets changes, the answer might change because the output might come up differently. NLB also caches this entry for 120 seconds for UDP and for 350 seconds for TCP.

Now, the second construct or concept we want to learn about is the feature we launched called QUIC pass-through. But what is QUIC? How many of you have heard of QUIC? We'll do a little overview of QUIC. QUIC is a newer protocol. It's essentially a UDP-based protocol, RFC 9000. It was written fundamentally for mobile nodes. All the protocols we have known so far in the TCP and UDP stack have been written for static nodes. But now, as we all know, the world has changed. The benefits of the QUIC protocol are that it reduces connection latency, and I'll show you how as we talk about it. It has built-in security with built-in TLS 1.3 encryption, and it has support for migration and multiplexing. What that means is if a mobile load moves, and we use our cell phones everywhere we go, if we're moving from Wi-Fi to cellular or vice versa, our IP address underneath is going to change, but the connections are still going to be persistent. Your connection will not drop.

Let me show you the benefits of QUIC. There is a TCP stack. Let's understand what a TCP stack looks like. On the network layer, we have an Internet Protocol, and then on top of that, we usually have either TCP or UDP. But on a TCP stack, the TCP stack provides us with data reliability as well as congestion control elements. On top of that, we build security using TLS 1.2 or 1.3. And then on top of that, actual application messages go on using HTTP 1 or 2. Now, compare that with QUIC.

The underlying network protocol is still Internet IP. On top of that, it runs UDP on port 443, which is QUIC. On top of UDP, you have a QUIC plus TLS 1.3 combination, and it's an interesting protocol in the sense that even though it runs on UDP, it has the elements of guaranteed connection, congestion control, and encrypted payload. So it is a hybrid and offers the best of both worlds. On top of that, it has HTTP/3 for the application workloads.

Now let's compare the handshakes that happen in the current TCP with TLS. When in a current TCP environment, there's a three-way handshake, the TCP three-way handshake that goes SYN, SYN-ACK, ACK. That's a three-way handshake. And then on top of that, clients typically will do a TLS handshake, and after that there will be actual data transfer. So there are a number of handshake messages that are involved, and after that the data connection actually happens. Compare that with QUIC: it's just one round trip, and that's it, and data starts flowing.

So that is the advantage because this protocol has been optimized and we have lessons learned from decades of use of TCP with TLS. QUIC has been optimized, and you can see why the connection establishment is so quick and why the data can move with low latency. Now, I said we launch a feature called NLB QUIC pass-through, and I highlighted the word pass-through. What is it? It is essentially based on an IETF draft that's mentioned here, and pass-through means NLB does not terminate the QUIC session. It will pass through the traffic that's coming from the client straight to your targets, and I will walk through the signal so you'll know exactly what is happening.

What are the benefits of NLB with QUIC pass-through feature? There are four benefits. Number one, it reduces the connection latency. Number two, it maintains target stickiness even when the IP address of the client changes or the port changes. Number three, it ensures backward compatibility. We have added a feature called QUIC and a TCP listener. So by default you can start a QUIC connection, but if your targets are not capable of supporting QUIC, it will fall back to TCP. So you have a fallback option. And last but not the least, it gives complete control to the application developers all the way from client to the end host or end targets that are running in your containers. So you can manage your own certificates, you can manage your own encryption, you can upgrade your application versions, and your infrastructure doesn't have to change.

So now having learned all the basics, let's dive deep. QUIC has two types of packet formats. There is something called a long header, and a long header is typically used when the connection is being established. In the connection establishment, there are a bunch of fields there, and we'll highlight some of those later, but we don't have to remember all of those. And once the connection is established, the client and the target will switch to a short header after the connection has been established. But the commonality between the two is what is called the destination connection ID, or DCID. It's a field that is 160 bits, which is 20 bytes. And why is this field important? This field is important because it maintains session stickiness. It is outside the encrypted payload of the QUIC packet, so load balancers can see it. It maintains consistency, essentially it remains constant even when the source IP, destination IP, or source port, destination port changes. It remains consistent, and this field is used by load balancers to maintain connection stickiness. So that's the use and that was the intent of writing this protocol.

So now having learned why this connection is important, let's look at how the connection establishment actually works in QUIC. On here you will see three different signals. On the left-hand side you have a client, and on the right-hand side you have a target, and in between sits a load balancer or Network Load Balancer. The client will typically initiate a connection and it sends a QUIC initial packet, and it sends out a random DCID, the field that we've highlighted before. It's a random destination connection ID.

We calculate a 5-tuple hash, and we looked at what hash value gets calculated in one of our earlier slides. It then selects a target and sends the packet to that target. The target responds with a handshake packet and responds with an SCID, or source connection ID. It is a 17-byte connection ID, and this might be too much detail, but this is a 300-level presentation, so I know some of you may appreciate the details.

So there's a 17-byte source connection ID. Within those 17 bytes, you also have something called a server ID that is 8 bytes. The server essentially tells the client that my ID is this 8-byte value and use this going forward. So even though when the connection changes, the client will keep using the same server ID. That means the connection will always go back to the same server. That's the point of that connection ID. Then NLB receives the packet and simply forwards it back to the client.

The client looks at that source connection ID and says, "I understand now the server gave me this ID, and I'm going to use that from this point on." So from that point onwards, the client will use the same destination connection ID as 17 bytes going forward regardless of IP address changes. Now the client will respond with a QUIC handshake packet of its own. Then NLB will create a connection ID to server ID mapping. It will send the packet to that target. Because of the source ID lookup, the source connection ID lookup, and the server ID lookup, the packet now gets sent back to the same server. At this point, the target handshake is complete, and then the target responds back to the client. NLB simply looks at the packet and sends it back to the client, and at this point the client handshake is complete.

This is what is happening in the meantime, and at that point the client will start sending data, and this is where the data transfer continues from that point onwards. Now this is where the connection establishment and data transfer progresses. But we said this protocol is fundamentally written for connection migration or for a client that's roaming. So how does this actually work? The client essentially roams and changes either its IP address or port or both, but keeps the same 17-byte ID. NLB will then look up the connection ID. It knows based on the lookup which server ID to go to and simply forwards the packet to the target. The target receives it, and then the connection stickiness is still maintained with the same target. The packet then goes back to the client. So this is how the connection is maintained even when the client has roamed to a new IP address or a new source port.

Having established this, let's understand how to enable this feature. Your question might be, "This is great. How do I enable NLB on QUIC?" It's fairly simple—a three-step process. Step number one is to create a QUIC or a TCP_QUIC listener. Jamie showed you in the previous section how to create listeners on NLB. You select either a QUIC or a TCP_QUIC listener. We recommend using a TCP_QUIC listener so you can start with QUIC by default, and if your endpoints are not capable, it will fall back automatically to TCP. That's step number one.

Step number two is to create a target group and enable QUIC on that as well. Step number three is needed only if you're not using the AWS Load Balancer Controller. Jamie showed the load balancer controller, which is our orchestration that helps orchestrate the load balancer. If you're using the load balancer controller, then this step is not needed. But in this case, this is where you assign server IDs if you're not using the load balancer controller. That was a simple process of enabling the QUIC feature.

Now, some of the key considerations as you enable QUIC—there are four things I would like you to be aware of. We can consider them as things to note. Number one, this implementation is based on the IETF draft. The IETF draft has remained stable for a number of years, so we feel confident that it will become an RFC in its current state, but things might change.

Life changes and things might happen, so just keep an eye out in case there are changes. Obviously, if it changes, we will update our implementation. The draft is unlikely to change, but I wanted to give you that caveat. The IETF specification also states that there is no UDP fragmentation allowed in QUIC. This is by definition of the protocol, so there are no UDP fragments allowed. Because it is not allowed, we will also drop it when NLB sees it.

Number three: this protocol relies on a server ID that is sent by a server. A server essentially sends its own ID and says, please use me going forward. The server ID ensures that the connection always goes to the server. However, because of that, if the server goes down at some point, you have to make sure your software is capable of handling that error condition. Last but not least, QUIC allows you to take Internet-facing traffic, and therefore you need to disable your access control. If you are enabling access control, you will have to implement it on your endpoint. That is again the definition of the protocol, and I just want you to be aware of it.

New Features Across Load Balancers: Weighted Target Groups, Target Optimizer, and URL Rewrite

All of those things are documented in the blog that we wrote a few weeks ago, so feel free to scan the QR code. What you just saw here is exactly documented there. That concludes our QUIC feature. I will do a quick overview of the other three features. The second feature we launched is weighted target groups. This feature is now available on NLB. Jamie did a great job on this. You saw that on the ALB, and NLB also supports weighted target groups now, which allows you to distribute traffic based on the configuration of weights on different targets.

What are the use cases? Where would you use this? First is blue-green and canary-type deployments when you want to minimize downtime during updates and patching. Second is AB testing for different versions of software, or you want to try different user experiences. Third is migrating applications when you are migrating from one version of an application to another version. Weighted target groups work in a very similar manner to what you are familiar with on ALB. Imagine this is a network load balancer that has two target groups: a blue target group and a green target group. You can assign weights, and based on the weights you assign, the traffic will go to those target groups. Again, the details of how to use this feature are documented in the launch blog. Feel free to scan it.

We have looked at two features that we launched on NLB. Now let us switch gears a little bit and talk about the two new features we have launched on ALB in recent weeks. Number one is an interesting and very exciting feature called Target Optimizer. Target Optimizer allows you to enforce a maximum number of connections. You can specify, for example, that you want to have only one connection going to one target, and that improves your success rate as well as gives you low concurrency. Why is this useful? It also increases target efficiency. This is meant for your AI workload, maybe training or inference workloads, where your target wants to dedicate itself only to one single task.

For example, with today's ALB algorithm, if you choose round robin or least connections, ALB will send incoming traffic to a target that may still be busy and doing multiple tasks from a previous request. In that case, your concurrency remains high and your error rates may go up. But with Target Optimizer, you can say you want only one concurrent task going on on these targets, and ALB will ensure that it sends the traffic to the target that is available to access only one task. How do we achieve that? There is an agent that runs on each target now. The agent is actually monitoring how many tasks are running on the target and communicates that back to the ALB. Based on that, ALB will send traffic accordingly. What does it do? It gives you a higher success rate and improves your target efficiency.

So it's excellent for your AI workloads. You can use it in one of the architectures that Jamie was mentioning. This is a very exciting feature. If you want more details, they are in this blog. Whenever we launch a major feature like this, we always write a very detailed blog with configuration examples, topology, and more. So feel free to take a picture.

The last feature we're going to talk about now is URL rewrite and host header rewrite. So the Application Load Balancer now has an exciting capability. You can not only rewrite a host header, but you can also rewrite part of the URL. The technology that's used is regex, or regular expressions.

Imagine there's an original request that's a GET request coming in with a path of user slash hello.htm and a host header of example ALB.com. You can specify a rule that says match the path by looking for strings called user and hello.htm. You transform that by adding dollar 1, which refers to the user part, and dollar 2, which refers to the hello.htm part, but in between you insert EN. So when ALB transforms the request, it will insert that EN, and you can configure this and it will send to the target. This allows the targets to do interesting things. You can manage your fleets appropriately. The original request doesn't even have to change, and the client doesn't even have to know. So this is how it works on a request level on the URL.

Now let's look at how it works on a host header. On a host header, you can say there is a host header that says example ALB.com. You can say in the match host header to look for anything that starts with example.star and transform that with dollar M. So essentially when the request goes in, the host header will have M.LB.com. This could be your fleet that's servicing only the mobile nodes. So again, this is a very powerful capability to change your request and your host header on the fly.

Similar to all the features, this also has a launch blog, so feel free to take a picture of this. So I've given you an overview of the four features that we launched, and I'll invite Jamie to close it for us.

Closing Remarks and Community Engagement

Jamie, thank you. I almost lost my mic here, so I'll hold it in this hand. So Matt had taken the covers off and pulled back the curtain to show us how ALB, NLB, and Gateway Load Balancer are actually created. I walked you through where they apply in some architectures, and then Milind went through and added the new features that we have today.

I know I've seen a bunch of you using your phones to take pictures of some of the QR codes, so I would like you to pick up your phones as well and type in routing loop.net. Every Wednesday, Matt, myself, and a couple other hosts do things just like this on Twitch, YouTube, and Twitter. If you go to routing loop.net, it will give you all of our previous episodes that cover things such as this, as well as what we're going to be coming out in the future. We'll be covering some of those chalk talks that some of you might have missed, and we'll invite some of those folks on the show. So please tune in for that.

Also, please don't forget to fill out the session survey. This helps us calibrate to make sure that we're doing the right thing for you and you're learning what you want to learn. So thank you very much and enjoy the rest of your day at re:Invent.

; This article is entirely auto-generated using Amazon Bedrock.

DEV Community