Kazuya

Posted on Dec 5, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - A modern approach to application migration with Amazon VPC Lattice (NET309)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - A modern approach to application migration with Amazon VPC Lattice (NET309)

In this video, Jamie, Yécine Allouache, and Ryan J McDonough demonstrate how to modernize application architecture using Amazon VPC Lattice. They walk through a practical migration scenario, starting with a complex architecture involving VPN connections, bidirectional PrivateLink, IPv6 workloads, and hybrid connectivity. The presenters explain VPC Lattice fundamentals including service networks, resource configurations, and policy hierarchy, showing how it handles overlapping IPs and IPv4-to-IPv6 translation. They demonstrate replacing VPN with Lattice for partner connectivity, simplifying acquisition integration, modernizing from monolithic EC2 to EKS containers, and implementing hybrid connectivity using service network endpoints alongside Transit Gateway. Ryan shares Goldman Sachs' real-world experience migrating from shared VPC architecture to Lattice-based networking in their Cloud Fast Track platform, addressing IP starvation, resource contention, and network isolation challenges while maintaining simplified developer experience through resource configurations and service networks.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction: Modernizing Applications and Infrastructure with Amazon VPC Lattice

Hello everybody and welcome to a modern approach to application migration with Amazon VPC Lattice. I'm Jamie, and I'm joined today by my good friends Yécine Allouache and Ryan J McDonough. They'll be talking to us about using Lattice and how you can use it to not only upgrade and modernize your application but also to upgrade and modernize your infrastructure as well.

Let's start out with our agenda. First, we're going to talk about what we're starting with. The easiest way to show you how to modernize with Lattice is to show you in a practical application with an actual architecture. When Yécine, Ryan, and I were building this architecture, we couldn't stop building it well-architected because it's beaten into our heads so many times. We had to add some things to say, "Here are some things that you can improve."

After that, we're going to talk about what needs to change and why it needs to change. Then we're going to work in how Lattice helps—what it is, how it helps, and some fundamentals. I understand that this is a 300-level session, but we'll be getting into some fundamentals for folks who have not used Lattice. How many of you use Lattice? Good, so about half of you. For the others, we can catch up on the fundamentals real quick and describe why we're choosing Lattice to do some of the things we're doing.

We're also going to be talking about how we're going to be modernizing with Lattice. Then the last bit is Ryan's story from Goldman Sachs and how he actually went through this and his practical application of modernizing with Lattice. Let's talk about our current landscape. When we're talking about anything in business, especially when it comes down to IT, everything comes down to a business requirement. Those business requirements are always started with some need of the business.

The business could be growing, or it could be a change. You might be bringing on a new line for manufacturing or a new product that you want to release. The other thing is that we hope all these businesses grow, and as they grow, so do your requirements. All this stuff has to happen within a certain time frame, so we need to pick the best tools that help us get what our business needs as quickly as possible.

What that generally means is that we tend to adopt more and more things. As we grow, the number of requirements increases. None of the requirements ever grow one after the other sequentially; they're always coming at you all at once. Lastly, everyone here who has ever built an architecture or built an application knows that things build on top of themselves. It's not like you scrape things away and get to build new all the time. When we're putting together this presentation, knowing these things, we asked ourselves how do we address these with Lattice.

Current Architecture Landscape and Identified Pain Points

Let's look at our current landscape and what we're going to be talking about. Here I have an architecture that we made where we have a provider VPC that's connecting via VPN because they need bidirectional communication over the Internet to our front door, where things come in. We have our backend, which is in VPC 2. That's where we want to modernize from a monolithic application to containers. We have bidirectional PrivateLink going to one of our acquisitions. How many of you use bidirectional PrivateLink? Have you heard of that? We've got a couple. It's real fun to maintain, because there's a lot of moving parts, and I mean fun sarcastically.

We also have our IPv6 offering. A lot of folks who generally deal with healthcare in the US or government find this is a requirement. Then we have our hybrid solution. We've got a Direct Connect going from our acquisition to a database we want to protect, and then we have our firewall. Going through this, we're going to list out all of the things we want to do. Our partners love the bidirectional, but they hate that every time we do an add, move, or change, we have to give them the keys, change IP addresses, and all that. So we want to change that.

The next thing we want to do is we want to have a better connection strategy for our acquisition. The bidirectional PrivateLink works for going to specific services, but as the companies merge more and more, they need to add more services, and that's not quite easy to do with bidirectional because you have to add more listeners. You have a max of 50 on the Network Load Balancer. It could be quite cumbersome for any adds, merges, or changes. Then next, we want to grow our IPv6 offering.

We have to add more things to that offering, but as you saw in our previous architecture, we're using private NAT and a bunch of things to go from our IPv4 environment to our IPv6 environment. Again, if we have to grow that, that connectivity becomes a bottleneck and it becomes a problem. Then next, we want to make sure that our mainframe on-premises only talks to our VPC 2. We can of course do that infrastructure-wise with security groups and whatnot, but we also want to know if there's a better way. If you're doing things infrastructurally, all those hops and all those pieces need to change. It would be nice if we can change it in one spot.

We also want to use our containers. This is the big move, where we want to upgrade those monolithic EC2 instances in VPC 2 to containers. And then of course with our acquisition, they need access to our on-premises database.

There's got to be a better way than having a separate Direct Connect. You see, it's going to take us through VPC Lattice and how it helps us solve these problems.

VPC Lattice Fundamentals: Building Blocks and Comparison with Transit Gateway

Thank you, Jimmy. So before we start answering why VPC Lattice will help, let's do a small refresher on what Lattice is. So here we see on the screen VPC Lattice, which we call an application networking service, and it connects, monitors, and secures communication between services and resources. You see here on the screen you've got a variety of compute types it supports, from the traditional EC2 to containers and even Lambda. We also support compute outside of AWS in a hybrid scenario. It also supports databases, and we can do those communications over multiple protocols. And last but very important, it also allows you to enforce your security requirements while keeping the monitoring and observability.

So now let's take a look at the different building blocks that Lattice offers. The first thing, the core of the service, is the services. So here you've got your application with any type of compute that we support, and you will put that application into one or multiple target groups. Once you've got that, you have this logical group which is what we call a service, and with that service it basically allows you to expose your compute as an endpoint and then you'll be able to apply different routing rules, load balancing options, and authentication policies. So that's the first building block.

The second one is the resources. So you might create one or many services which I just described, and you've got the second type which is the application resource. And here basically it's where you put all your TCP-enabled destinations. It could be an Amazon RDS, it could be a DNS name, or even an IP address. And to configure that application resource, you will attach what we call a resource configuration.

Next is the accounts. So with VPC Lattice you might have your services and resources in one or many accounts, and here you see that we've got a bunch of services and resources in account C. And you might have those services be consumed by either the same account or different accounts, and all of this cross-account communication is supported, and they will talk through what we call the VPC Lattice service network.

So as I said, there's the concept of providers and consumers, and I'd like to first clarify what we mean by that. So here on the screen you see there are four services and what we call a provider is a service or a resource that you provide to the service network by either associating the service to the service network or creating a resource gateway. So those are what we call providers. On the other hand, for the consumers, they'll be the VPCs and the service endpoints that are associated to the service network, and they will be able to consume the services that we have exposed.

Another point I'd like to touch on and clarify because Jamie and I get that question a lot is how does VPC Lattice compare to Transit Gateway. And while they are very different services, they can both live very happily together, but because we get that question a lot, I think it's good to clarify the differences between those services. So I'm sure most of you know what Transit Gateway is by now, but when we talk about Transit Gateway, we really talk about what we call a core networking service, and it's the service that allows you to connect all your VPCs and hybrid and create that central hub of networking, but we're mostly staying at layer 3 and 4. On the other hand, with VPC Lattice, we're more at layer 7. That's why we call it an application networking service, even though we can support any TCP-based destination. But we have a more managed service that simplifies that communication between those services and resources, and you have that extra security, networking, and observability built in. That's why we call it the application networking service. So that's the main difference between the two.

Another difference is also with the pricing comparison. So here I wanted to show you if you fully replace a Transit Gateway with VPC Lattice, what the cost would be. So we start with the Transit Gateway model and for every application you will have one load balancer. So if you have four load balancers for a certain amount of traffic, it comes down to roughly $1,300 per month. If you want to replace that architecture with VPC Lattice, then you will need to create one service network and four services, and that can come down to $750 per month.

Replacing VPN Connections: Partner Integration with Service Networks and Policy Hierarchy

So that's to give you a comparison point between the two services. I'll let Jamie talk about our migration strategy. Thanks, Cin. So now let's get back to that architecture and actually start doing stuff now that we've got some of the building blocks that you've seen has told us about. So bringing back the architecture just again to remind you of what we're going to talk about.

And now let's get into it. So the first one was partners were complaining about the VPN, right? What can VPC Lattice do to help us with the VPN? Now we know that VPC Lattice is multi-account, right? So it's a fit there. We also know that VPC Lattice is bidirectional, so there's a fit there too, and it also has inherent security.

So where do we start? We gotta create our VPC Lattice network first, right? So let's just kind of take a look at that. So the one key piece for sharing VPC Lattice across accounts is Resource Access Manager, right? So our Resource Access Manager will allow us to share our service network. So the first thing we gotta do is create it. We create our service network and then we associate our VPCs to the service network, and that's gonna be in account one. The second account is gonna share their service network and account B and C will go ahead and share their services and we can then connect them together and VPC Lattice will then watch based on policy which we'll get into in the presentation, and as well kind of watch that traffic as it goes back and forth.

And then lastly we have to create the services, right? That's where you create and share so you create the services on our side we're gonna associate the service network we've already shared our service network out on their side they're gonna create their services and join it to the service network and share it to us. So if we go back to our architecture, right, I'm gonna go ahead and just concentrate on the front door and I'm gonna create my VPC Lattice service network and I'm gonna go ahead and move that in. Now the thing that's important to note here is I didn't remove the VPN right away. I don't need to. I can do both and we can get people going and customers on board and partners on board without ripping and replacing what they're used to. There's no shock to the system.

You also notice that the partner right now has a firewall and that firewall is to protect the traffic going back and forth, but because we have policies, those firewalls aren't needed. So we've got a couple of places that we're gonna put policies and then we generated some generic policies here so you know don't copy and paste them. These are just examples, but what we're doing here is we're saying, OK, we're gonna accept traffic right from our front door and our front door has to have a token that says front door before we're gonna allow that traffic to go through. We're only gonna allow them to do a get and we're only gonna allow them to do a get on three different paths because we wanna restrict it down. We're a partner. We wanna be secure.

The next place that we would put a policy is on the service network. Now normally we always say service network you wanna make it as coarse grained as possible, right? This is not the place you wanna fiddle with as much as you or as little as you can because the blast radius is wider, but in this case because we're sharing our service network out to a partner we're gonna require a couple of tokens before they can talk on the service network and if we add more services, which we will, you just add more tokens, right? So you can make it as simple or as complex as you need it really comes down to your business use case.

And then lastly, of course, our front door we're gonna do the exact reverse that we did for our partner. So we're gonna say, OK, here's the certain URLs you can do, you can only do a get and if you're talking to me, you have to have the partner one token. So we can lock all that stuff down. Once that's all up and set and ready to go, things can go ahead and start talking. Right, so we'll go ahead and add this to our architecture. Now we're starting to slowly clean things up. And we'll move on to the next

So the next piece we wanna talk about is policy hierarchy. I just mentioned about coarse grain and fine grain. Let's just kind of put this all in perspective in one spot, right? So the service network, that's where you want coarse grain policies, right? That's where you wanna be able to change the major things. Some of the examples you might see in our public documentation would be you have to be a part of my organization to talk, right? For us what we're doing is we're saying you have to have specific tokens to talk because we're sharing that out. And then the services is where you get more fine-grained as you saw, right, only allowing a get only gonna go to these specific services, and that's it. And then of course we've got the resource configuration.

Solving Overlapping IP Challenges: Simplifying Acquisition Connectivity

I'm gonna go back one for a second. Resource configurations, they are a little bit different, right? So when you add a resource to a service network, that resource, everything has access to it. So you're gonna rely on your services to go ahead and tighten down who can and cannot talk as well as your service network. OK, now moving on. So let's talk about acquisition connectivity, and I hand it back to you, Cin. Thank you, Jamie. So let's go back to our scenario with our acquisitions. And you know, our leadership asked us to simplify our strategy when we acquire a company.

How many in this room have worked and had to consume services or export services to a VPC that had the same IP range? How many of you encountered this overlapping IP problem? There is nothing wrong with dealing with overlapping IP. The only issue is that because we want to consume resources and expose our own, we create what we call a bi-directional PrivateLink. Again, there is nothing wrong with using that architecture. It is just that you can see at scale how the operational overhead might become really difficult to manage.

Let us see how we can use VPC Lattice to help with this. You will see that VPC Lattice handles overlapping IP perfectly fine. I am going to walk you through how exactly the service works so you understand how it deals with that. When you have your service network, once you attach your VPC to that service network, it gets a VPC Lattice link local ENI, and that ENI gets an IP from the 169.254 address range, which is the link local address range. Once this is done, the route table also gets an entry with that IP address pointing to the lattice. Then the final step is you create a service, you expose it, and you use a DNS name. Here you see on the screen it is the default generated name. You can also use a custom DNS name. Once you do that DNS resolution to the service, it will point to the 169.254 IP address. It will not point to the IP in the other VPC, and that is how you handle the overlapping IP. VPC Lattice makes that problem a non-issue.

So we are going to apply this and update our architecture. Here we have VPC 2 and the acquisition, and we already have the service network that Jamie created. We are going to do the same way we did on the first step. We are going to create our acquisition service and our backend service. Once this is done, we associate that to the lattice service network, and here again we leave all the other components in place. We can do our testing, everything, and make sure that it works perfectly fine. Once we are happy with the results, we can remove all the components and simplify the architecture.

The same way we used the policy to replace the firewalling before, we are going to apply the same concept here. From the backend, it is the same policy type. We are going to allow the acquisition to talk to us with the GET request on a specific path, and we will use the acquisition token. To make this work, we also need to update our service network policy to allow the token to talk because remember you need both the service network and the service policy to allow the communication to work. We have done that on the backends. We can do the same on the acquisition service. Same thing, same story, allowing the backend to talk to us using a token on the various paths.

Now the question is, I have a policy that talks from the backend to my acquisition. Can the backend still talk to my front door? The answer is yes. All you need to do is update or edit your policy and add the new section that will handle that communication. That is a very powerful feature because now your team, when they build the service, can edit their policy and the security requirements can grow organically. They all have to edit the old policy to add the new requirements, and that will work.

IPv6 Workload Modernization and Hybrid Connectivity Strategy

So now we have our backend talking to our acquisition through the VPC Lattice Service Network. We also have our front door service talking to our backends. Everything works perfectly fine. So let us add this piece to our architecture. Here we have dealt with the partner service. We have also done the front door and our backend and acquisition, and there is another piece that now we will tackle, which is the IPv6 workloads. That is the new requirement we need. As Jamie said, when you deal with healthcare or government customers, they have this requirement and you need to talk to them over IPv6. The way we do it now is we use that private network gateway to do that translation between IPv6 and IPv4. We want to simplify this.

The same way as I said, lattice handles overlapping IP, it also handles the IPv4 to IPv6 communication. So it becomes also a known problem, and here we are going to do the same way we did before. We create our IPv6 service. We add it to our service network, and once we are happy with everything we

can remove that private network gateway because it's no longer needed. Then we have a working business workload. I'm not going to show you the policy here, but it's the same concept that we've shown you until now. We adjust the service network policy and the service to make everything work. So now that it's done, we've got our V6 talking to our backends. Everything is fine. Let's add this last piece to our architecture. So now we've modernized the partner service, the VPC, the front door, backends, the acquisition, and the V6 workloads.

So let's see what comes next. I promised that we would talk about hybrid as well as modernizing our application, our big piece in the middle—the monolithic to EKS. Let's tackle the hybrid, and you'll notice that as Yin and I are going through this whole thing, what we're doing is picking a lot of low-hanging fruit. We're getting Lattice in the front door. We're getting it on our network, and we're starting to use it and see the capabilities. I strongly recommend that's the best approach to start adopting Lattice: look for these low-hanging fruit things that we're doing, and then do that before you do the big move because, as you'll see, that is a lot easier afterwards.

Mainframe Integration and Application Modernization: From Monolithic EC2 to EKS

So let's talk about our hybrid. We want to make sure that our mainframe only talks to our backend services, and we want to give it a path using VPC Lattice. We like the idea that Lattice looks over our traffic, that service network switches the tokens every 15 minutes. Everything that we know for sure using IAM is secure because that's how we even log into our AWS. We want to take advantage of that and not have to change all the little pieces. So as you can see here, I've got my Direct Connect. I've got my transit gateway, and we already have our existing Lattice network. Now the thing to note is that at no time is the transit gateway or the Direct Connect going to disappear. Lattice suddenly did not get the ability to go right to your on-premises, so we still need it. This goes back to that point that Yasin was mentioning where Lattice and Transit Gateway work well together, and this is one of those examples.

So what are our options to connect our database in? You remember that I was talking about service resources, or you've seen Yasin actually was talking about the resources. So we're going to use one of those pieces for controlling resources called service network endpoints. So we put in a service network endpoint, which just like PrivateLink, grabs a local IP address. It's actually going to grab a range of IP addresses, but it grabs an IP address that's local to that subnet. Then those EC2 instances all they have to do is talk to that particular service network endpoint to gain access to those services.

So if we're going to apply this concept of using the service network endpoints with our mainframe, we're going to go ahead and adopt that now. Remember that goes ahead and it gets on that network. We're going to be well architected and we're going to put a service network endpoint in both of our subnets. But as you can see, the initiation of the traffic is going to come from our mainframe. It's going to go through Transit Gateway. Transit Gateway has its connections already into our VPC, and it has its endpoints, and it's going to flow through the service network endpoint. Now I would have made it flow through both service network endpoints, but then this slide would be a mess. So we're just showing it through one, but in fact it's going through both.

And now that we've adjusted our policies and we've got our communication all set up and we're allowing our communications from our policy for our backend to say okay, we're going to allow from this IP address from our mainframe, we're going to add that to our architecture. And again, at no time does our transit gateway go away. It stays. Direct Connect stays. But now we have a path, and we know that our mainframe is going to and through our backend service. Now granted, there will still be a couple of pieces because this is going over Direct Connect and Transit Gateway where we need to put in some security groups and things of the like. But it's a lot less now that we have to change. And we want to add more of our services and Lattice to go ahead and have that mainframe talk to it. We just need to go to those individual services, edit their policies, and say the mainframe can now talk, and it'll be able to talk immediately.

So the second-to-last piece that we have is the modernization of our application. Now we've already adopted Lattice. I've already told you that we're going to do all the low-hanging fruit and make this part as easy as possible. So let's see what that looks like. So I have my VPC, right? I'm going to go ahead inside my VPC. I'm going to create another service. Now this doesn't have to be in the same VPC as Yasin told you. This works well with cross VPC, and I've seen some instances, for instance, if you have to upgrade an EKS cluster like how we force you to upgrade every six months, some folks have IPs baked into that, and it's very difficult for them to go ahead and put it side by side.

If you don't have an extra database, we threw this little wrinkle in here just so we can talk about it. Create another VPC, a whole new VPC, and create a new cluster if you want to do your cluster upgrade. Then wait and say I want to send 40% of my traffic to one and 60% traffic to my new cluster. When we're happy, both of them are being used at the same time. At no time are we ripping the band-aid off and giving our customers a negative experience. We can just go ahead and consolidate and remove the older one.

Going from a monolithic EC2 to microservices and clusters involves a lot of application work. There's also a lot of burden on us networking folks to help that along. However, because I'm using VPC Lattice and doing the low-hanging fruit approach, this was really a non-issue for me. So I'll go ahead and add this to our architecture. Now that we're looking at this part, we can see that we've modernized our partner acquisition, we've got our IPv6 workload, we've fully modernized, and we have a path for doing continuous upgrades to our backend services.

We can do the same thing if we wanted to for our front end. However, we have to be a little careful because the frontend service also has Internet access coming in, which could be a little more disruptive. For each of these pieces, we've shown you a way to adopt VPC Lattice side by side with your existing configuration so that you can cut over as you need, not immediately. It's not like you have a new service and everyone's on it now, and you're doing this right before your kid's birthday, and then something goes wrong and you get yelled at. I'm speaking from experience.

Resource Gateway Implementation: Enabling Secure Database Access Across Environments

So what about our database? Remember on that other side we have something about our database connecting to our acquisition. Yécine is going to take us through that solution. What about our database? Our acquisition needs to access the on-premises database, and right now they have a separate Direct Connect connection. They want to keep those things separate. They don't want the acquisition to get access to the rest of the network, so they've set up their own dedicated Direct Connects.

We want to change that and make things a little easier and eventually remove that second Direct Connect connection. Now we're going to use the Resource Gateway, which I was mentioning before, and how we're going to apply that here. Before going a little further, let's take a look at what the Resource Gateway is. Here on the yellow part, it's really your ingress points of the traffic, and then the backends will be defined in your resource configuration.

The resource configuration can be either a public DNS name, an RDS database, or even an IP address. Once you create the Resource Gateway, you attach a resource configuration where you define what the backend of that service response is. Remember, when you add an RDS instance as the database scales up and down, the resource configuration gets automatically updated. You don't need to change it, but please keep in mind that if you use IP addresses, you will need to do that work yourself. It won't necessarily be automatically updated.

Once you have that resource configuration, you'll be able to connect to that Resource Gateway either by using a resource endpoint, which is basically a PrivateLink access to the Resource Gateway, or through the VPC Lattice service network. Your clients in that subnet can connect to the service network either by associating the VPC to VPC Lattice or using the service network endpoints. Let's look at how we can break down the different steps to update that architecture. We've got our database, so I'm going to create my Resource Gateway and create my resource configuration, and I will define the IP address of my database.

Once this is done, I can attach my Resource Gateway to my VPC Lattice Service Network, and from that point on, I do not need that secondary Direct Connect connection. Here we can make sure that the acquisition VPC does not have access to any other resources from on-premises because that Resource Gateway only defines the backend to the site. So here it can only connect to the database. It cannot access anything else, even though it's using the Direct Connect connection. Here you see how the traffic flows between the acquisition backends all the way down to the on-premises database.

So let's add this piece to the architecture. If we summarize what we've done so far, we showed you the starting architecture, which was quite functional, but we had different problems: overlapping IP addresses, IPv6 to IPv4, and modernizing our backends.

We showed you how you can modernize those step by step, and here to arrive at that final architecture we have everything in place and you still see the transit gateway. As I said before, it does not necessarily need to replace the transit gateway fully. You might still need that for another type of communications, but now everything talks through the service network. We use our policies to enforce our security requirements. Now I'll hand it over to Ryan who's going to explain to you how he's using VPC Lattice.

Goldman Sachs Cloud Fast Track: VPC Sharing Model and Its Limitations

Hi everyone, I'm Ryan McDonough. I'm with Goldman Sachs, and I lead the technology for our managed continuous deployment platform that we call Cloud Fast Track. A number of my colleagues have presented Cloud Fast Track at different AWS events, but we primarily focus on what Cloud Fast Track does in terms of evaluating policy as code to enforce our security baselines. But one thing we haven't really covered is how we perform networking with this platform. Today we want to cover how we do networking in Cloud Fast Track and how we're now leveraging VPC Lattice to enhance our network capabilities.

But first, let's talk about what Cloud Fast Track is. It is our primary continuous deployment platform for getting applications into AWS. We launched this in 2021 to really improve developer productivity and reduce the number of manual reviews we've had on applications. To achieve this, we took a shift left approach in how we do policy evaluation and front loaded all this into pre-deployment. Our applications are all authored in AWS CDK. When we pre-deploy, we synthesize that CDK application to CloudFormation and from there our guardrails evaluate policy against the generated CloudFormation templates.

At this stage we're looking at things like whether an S3 bucket uses a KMS key, whether the key is owned by Goldman Sachs, and whether the bucket is publicly accessible or not. But there are some resources that we can't really cover pre-deployment, so we do have to layer in things like SCPs and RCPs and even permission boundaries to fill in some of the gaps. For networking we use a collection of VPCs that are shared to user accounts. Let's take a look at what that workflow looks like.

When a user requests an environment, this is all self-service by the way, anybody can come in and request a new environment. When we do this we provision two accounts. There's one account that is your service account. This is the target of the CDK application. All resources in that CDK application are going to be deployed to the service account. The other piece is your pipeline account, and this pipeline account is going to be exclusive to that service account. This is going to run an AWS CodePipeline that is going to deploy your application to the service account.

Both these accounts are associated with an OU in our organization. Once that's done, everything is set up and we provision things like roles for break glass access and detective controls. Additionally we also do some housekeeping like disable regions that we're not going to support and remove the default VPC. Once the accounts are ready, users can get to work and start pushing code to Git. When this happens, the pipeline gets kicked off and then we run the guardrails. This process runs a standard CDK synthesis and deploy, but in between there we evaluate our guardrails. If the guardrails pass, then we allow the deployment to proceed. If it doesn't, then we fail the deployment.

Our guardrails not only check the configuration of CloudFormation resources, but we also use it to gate different types of resource types. For example, if AWS introduces something brand new, we have to go through a review process and we eventually onboard it. But one resource type that we've blocked up until now is the creation of VPCs. So you have to wonder if we are blocking VPCs, how do we do networking. For different application teams or business units depending on the granularity you choose, we create a VPC and we share this to user accounts using VPC sharing.

If you're unfamiliar with VPC sharing, this is a mechanism where you can take a VPC and define it in one account and then share that VPC through RAM to a number of participant accounts. You're going to do this against an OU. That VPC can get shared to all participants in that OU. If you want to know more about VPC sharing, you can look at the routing loop, and you can see Jamie and Alex, and they have a really good discussion and get that in really in depth. We chose this for a few reasons. One of the things that we want to do with Cloud Fast Track is to make the cloud a bit more approachable, and some of our users, they have really great expertise in AWS and cloud in general, but some teams, this is a new experience for them, and we don't want to inundate them with having to learn things about CIDRs and all networking concepts that they may be unfamiliar with.

A lot of these users are coming from an application-focused background, and having them find their own VPCs and networks is a daunting task. This approach really helps simplify that onboarding experience. From our perspective, it allows us to control the network and the perimeter and all the associated aspects, so we felt this really fit our needs well.

The way the shared VPC works is that we define a VPC for teams and we define two network zones. We have a routable subnet, which is basically a set of subnets that uses IPs from our own network. With the right firewall rules, this will allow you to reach endpoints that are deployed into that subnet. The internal subnets are not routable from our on-premises environment, but this is where we provision things like your AWS private endpoints. We also define all the endpoint policies and make sure that those are in alignment with our risk controls to ensure that we can only access GS managed resources and access governed by GS principles.

Like most organizations, there are services such as SDLC endpoints, identity, HR, and EMDB. Many of these services predate Cloud Fast Track, so they are exposed via PrivateLink. What we do is create these endpoints and configure all the DNS configurations for them so that when users onboard, these things are already there for them. When they provision their accounts from the workflow we discussed, they are assigned to the same organizational unit as the shared VPC. Through RAM, these subnets are now visible in each user's account.

What is interesting about this is that when you log into the console, remember we delete the default VPC, so this will appear like a VPC that is in your local account. The nice thing about this is that all your connectivity is preconfigured for you out of the box, and all you need to do is know what your VPC is. Working with a shared VPC is pretty much just like working with a VPC that you have defined on your own, though there are some limitations. You are not the owner of that VPC, so you cannot modify the network. You cannot create a private hosted zone. You cannot add or remove endpoints or create your own endpoints.

To some users, that might seem like a step backwards, but in many cases this is fine. This allows us to maintain control of the network, and it works rather well. The only thing you have to know is that you have to figure out the VPC ID. Thankfully, CDK has a nice mechanism to look up these VPC IDs and resolve that. Once you have it, you start working with it just like any other VPC and you can provision resources into it without any problem.

We found that this model worked really well out of the gate. It allowed us to really simplify the developer onboarding experience and have everything preconfigured for developers before they even provision their accounts. Now you might be wondering why we are talking about VPC sharing in a talk about VPC Lattice. Well, VPC sharing is not without its challenges. There are going to be some workloads that simply are not going to work with a shared VPC. As a financial institution, there are some workloads that are going to need much stricter network isolation. In this case, a shared VPC is not going to cut it.

One of the bigger challenges you are going to find with a model like this is resource management and IP starvation issues. Your routable IP space is a finite resource, so you have to manage this. These types of issues are a bit of a slow burn. They are not going to happen out of the gate. They happen over a longer period of time because when you provision these VPCs in this manner, you are doing it at a point in time with the information you had when you set it up. As time goes on, as your business improves and you build new things and add things that were not there when you created the VPC, you run into things like IP starvation. You have noisy neighbor issues, and handling resource quotas becomes a big challenge.

As things start to come online, suppose we go back to team one and team two and let us say we have a team three that has a custom vendor integration or that account represents an acquisition and we need to punch holes into our endpoint policies to reach something external. Now the question you have to ask yourself is whether you want to open up these endpoint policies for all participants in that shared VPC or just one. The more of those types of use cases you get, the complexity of that endpoint policy gets really complicated, and it may not be something you want to pursue. You really have to think about how that is going to work. Finally, getting the right granularity is really tough. Unless you are the application team, you are not going to know how to right-size this VPC and how many accounts you can associate with that VPC.

Next-Generation Networking with VPC Lattice: Achieving Stronger Isolation and Simplified Developer Experience

You may think we can simply switch to a VPC ID, but it's really not that simple. Moving VPCs is a one-way door and constitutes a migration, so it's not as straightforward as you might assume. However, we believe we have a better option. Before we explore that, let's discuss what we want from a next-generation networking architecture.

We've already talked about stronger network isolation requirements and the need to avoid resource contention so teams aren't competing for the same resources in the same VPC. There are other capabilities we want to achieve that we cannot capture today in the shared VPC model. We need visibility into which endpoints are being exposed to which consumers across different accounts, and we still want to retain a simplified developer experience for networking so that enterprise services already integrated into our shared VPC can be used just as easily as they were in that model.

Let's look at what this looks like with Lattice. We've provisioned two accounts, and now we have no shared VPC. We update our guardrails to allow the creation of VPCs and introduce an AWS CDK construct that builds VPCs in a shape we prefer teams to use. However, we do not permit the creation of NAT gateways or internet gateways so these don't have unintended internet access. Additionally, we leverage block public access, which ensures that even if someone circumvents our guardrails, they cannot reach the public internet unless we've explicitly enabled it. This is enforced at the organization level.

Now we have VPCs with overlapping CIDRs, but how do we access the services? The shared services that previously used PrivateLink have been updated to expose them as resource configurations. In a platform-managed account, we create what we call a shared services service network, which contains all these services. Our construct can then associate these VPCs with the shared services network, giving us the ability to reach these services from different VPCs with overlapping CIDRs.

One important feature of Resource Configuration is that we don't need to create hosted zones in each VPC to map custom domain names to each service. This is not the responsibility of individual team accounts. Instead, the Resource Configuration declares its custom domain name, and without any interaction from team accounts, they can resolve those services on the same FQDNs. We don't need to do any of the things we were doing with PrivateLink, and the name associated with these services is declared by the service owner, not by the application teams.

Now I want to address PrivateLink because there's confusion about whether you must convert everything to Lattice if you're using PrivateLink. The answer is no. PrivateLink and Lattice work happily side by side. In this case, Team 2 has a vendor relationship using PrivateLink. Many vendors still use PrivateLink and won't switch to Lattice just because you want them to. This works fine, and Lattice and PrivateLink coexist well. However, I want to point out something important. In the shared VPC model, you're exposing vendor connectivity to all participants in that VPC. In this case, we want to ensure that only Team 2 has access to that endpoint, not anyone else.

Now suppose Team 1 and Team 2 want to create bidirectional connectivity. We could use a bidirectional PrivateLink, but instead, we can create a new account that defines a service network exclusive to the application teams, independent from the shared services network. Using a service network endpoint, they can add an additional service network. You might wonder why they can't just use the shared services service network. The reason is we have different permissions for how we share the shared services service network versus the team-specific one. With the shared services service network, we control the permissions on how RAM shares that network. Only platform service teams, such as your SDLC team and identity team, can create and associate services with that service network. General consumers cannot, so this isn't a place where everyone in the organization should dump services.

Everybody in the organization can simply use VPC Lattice to access services. If you need to have your own service network, you can spin one up, define what accounts need to access it, and now that is a private service network that is for your work group, instead of having an entire VPC. One of the challenges that we had in onboarding this is how do we get all these service teams to leverage VPC Lattice? We don't want to say that everybody has a new work item for you and go do a bunch of stuff. But thankfully, it's pretty easy.

If we take a look at your high-level PrivateLink architecture, you have your compute resource that is your service, and there's going to be a Network Load Balancer and a PrivateLink service. So how do we get this to work with a resource configuration? One thing we don't want to do is disrupt the existing PrivateLink customers. Those are still going to exist, and we don't have to do anything. But what we can do is create a resource configuration from an ARN, from an IP address, or we can also use a domain name.

In this case, we simply take the domain name from the Network Load Balancer, which is the AWS-assigned FQDN, and we use that to create a resource configuration. Now we can take that resource configuration and associate it with the shared services network. Instead of having all of these accounts request to be allow-listed for every single PrivateLink endpoint, we can get everyone in the same organizational unit path or in the entire organization to have access to the shared services network. This really helps these use cases where you have a service that you want to share in a one-to-many fashion with a very large number of accounts.

With all of this, we now have a much stronger network isolation story. We don't have issues with resource contention or IP starvation. All those resources are exclusive to the owning account, and we have a really nice story around how we handle cross-account access. We can use Lattice, we can use resource configurations, and we can still use PrivateLink when needed. These one-to-many use cases are very easy to solve now. We don't have to allow-list a bunch of accounts everywhere else.

But I think the really important thing is that we now also have a very good story around simplifying the developer experience when we onboard. Just like with the shared VPC model, we have a mechanism now where someone can bring up a VPC and all their connectivity needs are met out of the box, at least the majority of them, not the bespoke ones. This is a nice simple way to get this done and maintain isolation. I don't want this to be like a VPC shared VPC versus Lattice conversation, but there are different things. We really think that shared VPC has applicability for specific use cases, but at scale it can really present some challenges.

We have been pretty happy with how this has worked out so far, and we are looking forward to seeing more with Lattice down the road. I think that wraps it up. We will take questions at the door because they are shutting us down to turn the room over for tomorrow. If you have any questions, we can take it over there, and thank you again everybody. Please remember to fill out the survey. It is important to let us know how we are doing, and we will see you at the door. Thank you.

; This article is entirely auto-generated using Amazon Bedrock.

DEV Community