DEV Community

Cover image for AWS re:Invent 2025 - Networks at scale and how to automate operations (NET323)
Kazuya
Kazuya

Posted on

AWS re:Invent 2025 - Networks at scale and how to automate operations (NET323)

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025 - Networks at scale and how to automate operations (NET323)

In this video, Paolo Sánchez Carmona and Claudia Izquierdo demonstrate how to automate AWS network operations at scale using infrastructure as code. They address three major challenges: manual processes, inconsistent deployments, and late validation. Through live Terraform coding, they show how to enforce IP address management using VPC IPAM with Service Control Policies, build a global network across three regions using AWS Cloud WAN with policy-as-code, and automate VPC attachment routing using AWS Step Functions and EventBridge. The session includes implementing service insertion for traffic inspection, creating attachment policies for automatic segment association, and using AWS Organizations guardrails to prevent non-compliant resource creation. They emphasize architecting by use case rather than habit, starting with business outcomes, and applying the same automation principles used in application development to network infrastructure.


; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Thumbnail 0

Introduction: Scaling Challenges in AWS Network Operations

Perfect. Let's get started. Network operations at scale has three major challenges: manual processes that don't scale, inconsistent deployments that create security risks, and validation that happens too late. In the session today, we're going to try to solve these challenges by applying the same principles that the majority of your developer teams, or if you're developers, are already doing in your own teams, which is automation, testing, and policy-driven deployment. Welcome everybody. This is NET 323: Networks at Scale and How to Automate Operations. My name is Paolo Sánchez Carmona. I'm a Senior Networking Solutions Architect at AWS, and with me I have Claudia.

Thumbnail 60

Hi everyone, I'm Claudia Izquierdo. I'm a Senior Solutions Architect here at AWS. Perfect. So what's the agenda today? Who knows what a Code Talk is? Raise your hand. Not so many people. Okay, perfect. So the agenda today is going to be intensive in coding solutions, so we will try to spend less time speaking and more time interacting with you while we build something. But first, we wanted to expand on these scaling challenges in AWS networking and what the best practices are for automated network operations that we have seen in customers. We will explain briefly the use case that we're going to use as an example today, and then of course time to code, and then a little bit of takeaways for the session. We're going to be talking about all of the things so that you know what we want you to take away after the session today.

Thumbnail 110

About these challenges, let me start by painting a picture that maybe the majority of you can relate to. A company starts small with a small number of VPCs just connected between them, a couple of accounts, and it's easy to get full control. You know exactly how things connect, what the security of the application is, and of course you make sure that everything is documented and working perfectly. But then growth happens. New VPCs come in, new accounts, mergers can happen, and suddenly it's not a natural growth—it can be exponential. Those tight controls that initially worked for these small environments can become a bottleneck for specific teams. Usually, cloud networking teams are not that big in comparison to the amount of resources that you need to manage. Application teams are asking them to be agile, but then they need to start waiting for approvals and to get routes done or specific security controls in place. These teams, because they want to be agile, are well-intentioned hopefully most of the time, so they open a port or open a security group a little more open than we would like to have to get that connectivity happening. Now we have configuration drifts and configurations that we don't know what's happening with them. Of course, when we need to troubleshoot or know exactly what's happening with our environment, we're not really sure what's going on.

So when we grow, we get these three main things: consistency and governance issues, manual configuration bottlenecks, and especially change management. We always say that the cloud allows you to be agile, but we usually think the network is something fixed and that shouldn't be like that. The cloud network should also allow for change management without adding extra complexity into that. So to face these three challenges, we are starting to adopt a multi-account environment, and this multi-account environment also comes with guardrails. Working with guardrails helps you to focus on permission control for the security of your architecture instead of only networking blocking, and with this you can improve how to manage the security of your architecture and your network in AWS. And to make this easier for everyone, second is to use managed AWS network services that we have here at AWS, but more specifically the service to build global networks. Like I mentioned, the tasks for all these network engineers like creating Amazon VPCs, creating AWS Site-to-Site VPN, creating routing tables, adding routes, deleting routes—all of these different tasks can be optimized and help all these network engineers working with these global networks. And then go serverless, use the different serverless services that we have at AWS.

Thumbnail 380

Thumbnail 410

So you can deploy and automate many different tasks that not only network engineers have to do, but also the many requests from different users who are asking for help from these network engineers. So how does this look at AWS? First, start using AWS Organizations. AWS Organizations is the service at AWS where you can deploy multi-account environment architectures and also add guardrails, which are named Service Control Policies. So first start with AWS Organizations, but then work with AWS Cloud WAN. AWS Cloud WAN is a service to build global networks, but not only building global networks, but building global networks with policy as code. So in this session, you are going to learn how to build these policies. The idea here is to know a little bit more about AWS Cloud WAN and AWS Step Functions. Step Functions is going to help you automate the different steps that are required to build the things that the networking engineers need help with, but orchestrating those different steps that are required for this to happen and to help everyone in everything.

Thumbnail 470

Use Case Overview: Building a Controlled Multi-Region Network Architecture

Perfect. So what is the use case we are going to try to use today to show you these best practices? Basically, we have an environment where IP address management we want to be controlled by VPC IPAM. We want to make sure that teams do not create VPCs with IPs that are not in the IPAM pool that we are going to provide to them. We need traffic segmentation. The logic we are using today is that we want to have a routing domain per environment. Whatever routing environment we have, we are going to keep it simple for today, but we want to have that traffic segmentation also in our network.

Thumbnail 480

Thumbnail 500

Thumbnail 510

We want to have traffic inspection. We want to make sure that everything is inspected between routing domains and anything that goes out to the Internet. And of course, it needs to be low in operational overhead because we only have two engineers and two people that you see right now here talking to you, only managing all that network at scale.

Thumbnail 520

So what we have already built and then I will go to what we are going to be building today. So this is the organizations we have. We are keeping it simple, but definitely think of more than one organizational unit. We have three accounts. We have the account that is going to be managing all of these organizations, one account networking account in the networking organizational unit, and the spoke account in the dev organizational unit. In the networking accounts, we have defined Amazon VPC IPAM with the root pool 10/8, and then to each one of the regions we are defining a pool for the dev environment that we are sharing using AWS Resource Access Manager RAM with that specific spoke account.

Thumbnail 530

Thumbnail 540

Thumbnail 560

So the spoke account can only use the IPAM pools we are sharing. On the network level, AWS Cloud WAN we have an initial version of the network and we are going to be building more during the session today. We have a hybrid segment and a dev segment. I think we also have a prod segment, so we have some segmentation already created and we have, for the ones that do not know, service insertion. We will explain in a moment, but that is our way in Cloud WAN to automate the creation of inspection by just simply telling the service, "Hey, I want to inspect these two segments or between a segment or things outside of the Internet." But we have already created that inspection segment, which is called a network function group, and our three inspection VPCs with AWS Network Firewall, already created them, so we are just going to be creating the inspection itself.

Thumbnail 580

Thumbnail 590

And of course, AWS Cloud WAN is created by AWS and is shared using Resource Access Manager with the rest of the accounts. But Pablo, do not you think that everyone is here to learn a little bit of coding? Yeah, so we are going to be coding. I think we took ten minutes, which is exactly the time we were thinking about. So let us go into the different requirements. Let us go with the first requirement, which is VPC IPAM as the only API allocation source. So what are we going to do?

Thumbnail 610

Thumbnail 620

Thumbnail 630

Implementing VPC IPAM with Service Control Policies for IP Address Governance

First, we're going to show that by default I can create noncompliant VPCs just to demonstrate that. By default, yes, someone can create something outside of the VPC IPAM. Once we do that, what we're going to do is create a Service Control Policy that only allows creation of VPCs with the IPAM pools shared with the accounts. We're going to do the same creation and we're going to get it denied. And afterwards, we're going to start creating the different VPCs. So let me change to the demo and let's go for it.

Thumbnail 650

Thanks, Pablo. So I'll be walking around here. So if you have questions, please let me know. Remember this is an interactive session so the idea is to hear from you too. So Paulo can let you know what you are coding, what type of coding it is, and why you are using Quiro. Yes, so first of all, I'm using Quiro. If you don't know, it's our IDE that also allows us to use AI agents. We're not going to be doing the help of AI agents today. It's going to be pure Terraform. But what I'm using is the VPC Terraform module. We're going to be coding everything today in Terraform. The VPC module created by us and maintained by the networking team at AWS, just to create where I'm going to start creating that noncompliant VPC, and then I'm going to move to put the SCP in AWS Organizations.

Thumbnail 710

Thumbnail 730

Thumbnail 740

Thumbnail 750

Thank you, Pablo. So the idea here is that we are just getting started. This is IPAM. IPAM is, as Pablo mentioned, trying to stop someone from deploying IPRs that are not allowed. And why is it not allowed? Because Paulo and I as the networking architects and engineers decided that this is the pool that everyone has to use or it's divided by the different segments that are based on the best practices that he and I just decided. So the idea here is to work with IPAM to use that pool that Pablo and I decided, but Service Control Policy from AWS Organizations is going to help us. So no one is going to deploy a network that is not allowed. So back to you, Paulo.

Thumbnail 780

Thumbnail 790

Thumbnail 800

Thumbnail 810

Thumbnail 840

Yeah, so basically right now I'm creating that VPC, so I'm coming here. I think I'm creating North Virginia. So basically now I guess you see it better, right? So now we have the mock VPC that is not complying at all because we are going to be using the 10 slash 8 and we have the 172 range. So now I'm going to add AWS SCP to block that specific action from happening. Let me start by destroying this. And then I'll move to the part of the code for AWS, and yes, and then I'll remove this. Perfect. So if I move to my organization's folder, I already have defined SCP and the only thing I'm going to do is create the resources to deploy in my organization's account. Basically what I'm doing is denying any creation of a VPC and also association of a secondary CIDR block. I don't want anyone cheating after they created the VPC with the IPAM pool. So I'm denying these two actions if there's no IPAM IPv4 IPAM pool true. And also because we're controlling which IPAM pool we're providing to the dev account, we're controlling that it's the specific IPAM pool that they can build. So I'll create the two resources right now.

Thumbnail 850

Thumbnail 870

Thumbnail 880

Thank you, Paulo. So who from the audience is working with IPAM? OK, and who is working with AWS Cloud WAN? Great. I love that. So the thing here is to let only Pablo and I as the network specialists decide what IPRs can be used by many of the other parts of our team because you know slash 24 is not for all the Amazon VPCs and for all the subnets. So if anyone has a question, please raise your hand and I will reach to you to see what question you have. So Pablo, back to you. I'm still finishing, so OK. I don't think I can type and speak at the same time. Don't worry.

Thumbnail 910

Thumbnail 920

So my question is, how can you share a specific segment to a specific account? For example, if I have a situation where I dynamically create accounts, I want to make sure that specific ciders or segments go to specific accounts. The question that Miguel just asked is how can you make an IP address resource to be attached or go with a specific segment, like for example, production or development. Paulo, have you done that before?

Thumbnail 980

Thumbnail 990

Yeah, give me a moment. I'm just troubleshooting. Don't worry. Thank you very much. I definitely needed an AI for this. So what was exactly the question? How can you make an IP address resource to go for a specific segment, like this is for dev, this is for prod, and things like that. So you need to apply that logic more to the way you share the IP address pools currently. You create your IP address pools and then when you are doing your infrastructure as code, when you do RAM to a specific organizational unit where it's dev or test or production, in the case that they have accounts be dynamically created. I'll have to have some, yeah, okay, so the majority of customers use AWS Control Tower that you can use for AWS CloudFormation or for Terraform where you can do your account factory where you can say hey this account is dev, then this is the IP address pool and then of course this is your AWS Cloud WAN or transit gateway. You do that when you create that account you create your blueprint and then when you do that dynamically, you just provide those kind of shared services in the beginning.

Thumbnail 1020

Thumbnail 1030

Thumbnail 1040

Thumbnail 1050

Thumbnail 1060

Would I need to have a specific account to run the account factory? I think yes, and I'm not sure, but definitely yes, you will need one account that is the one managing the account factory. And then push into the rest of the accounts. Yes, best practices, it should be one main account to push everything from the other accounts in AWS Organizations and in AWS Control Tower. Perfect. So I already have the service control policy. You can see here I have the same JSON I shared you but now created with our target or dev account. So I think I removed that. Let me see if I can put it back, yeah.

Thumbnail 1070

Thumbnail 1080

Thumbnail 1120

Thumbnail 1130

So now I'm going to try to create it again and see that now I shouldn't be able to create that specific VPC. So Paulo, we have another question. Why are we using Terraform and not AWS CloudFormation? That's a personal decision. It's just like for me it's more native to write Terraform than CloudFormation, but all the resources that we are creating today, they're available in CloudFormation. Also, if you use AWS CDK, we have layer two constructs for layer three also for this AWS networking services, yeah. So I think I need a clear, but now if I do the apply I should get denied on the service control policy. So now what I'm going to do is I'm going to start typing our proper VPC configuration with the VPC IP address pool configuration. What's the question?

Thumbnail 1140

Thumbnail 1150

Thumbnail 1180

Thumbnail 1200

Is it possible to write the policy without mentioning the IP address pool, so basically you just mentioned less a. So as long as your VPC gets a size of the classic range, so it's like hard code the IP. Okay. So the question here is that if we can use IP address manager and go to hard code the IP addresses in the coding per se, so it's possible but here like the thing is that with IP address manager you are not hard coding, you are learning to IP address manager to take advantage of IP address manager. It's not only to give you IP addresses, it's going to help you to keep control of your IP addresses. So for example if you are close to not having enough IP addresses pretty soon it will let you know like if you are not going to have enough IP addresses soon and also it helps you as Paulo just said to have a design and how you want to distribute your IP addresses so I think that it helps more to have IP address manager so you can do more than just stop people from not using the IPAM that you would like.

Thumbnail 1210

Thumbnail 1220

Thumbnail 1240

Thumbnail 1260

Thumbnail 1270

Thumbnail 1280

Thumbnail 1300

You also have control and monitoring of the IP addresses. But for situations where you don't use IPAM or you use a different approach, let me hand it back to you, Pablo. Yes, so now I have the version of the VPC created. You can see that the way we're doing this is by defining the VPC IPAM pool. The netmask length in that case is 24. We're also creating a dual stack VPC with IPv6 as well. I'm taking a shortcut by doing a classic copy-paste and changing the region. You can see that we're using North Virginia, Oregon, and the Spain region to show you a multi-region solution. Here, what I'm doing is I have some local variables where I'm obtaining the IPAM pools from AWS RAM and putting them in a local value to gather which IPAM ID corresponds to each specific region. I won't create the VPCs now. I'll do everything at the end so you can see how everything connects to the cloud and how all the automation runs. Now we're able to show, let me come back to these slides for a moment, to show exactly how our first requirement has been met. Our VPCs are now compliant in that sense and they're able to get the pool from IPAM without being hardcoded.

Thumbnail 1320

Thumbnail 1370

Building a Global Network with AWS Cloud WAN and Service Insertion

I think we can move to the second requirement, which is an easy-to-build global network. For that, we have AWS Cloud WAN, but what do we need to do there? We need to have inspection between routing domains and egress inspection also between regions. If I have traffic between regions, I don't want to inspect in both regions. I only want to inspect in one. The region in orange is the region I want to inspect for every pair of regions that I have. I also want to do something quite cool in Cloud WAN, which is to automate how the attachments get associated to the routing domain automatically. The logic I want to apply is that VPC attachments are associated to their organizational unit routing domain. Anything that is hybrid, such as VPN connection attachments and direct connects, are associated to the hybrid segment.

Thumbnail 1380

Thumbnail 1390

Thumbnail 1400

Thumbnail 1410

Thumbnail 1420

Thumbnail 1430

Thumbnail 1450

Thumbnail 1460

Thumbnail 1470

Something really important that you're going to see now that I'm going to code that part is that I'm going to work with something called send-to and send-via. It's the main part of the coding with AWS Cloud WAN. Send-to is used when you want to know what the network traffic is going to end at. Send-via is when you want the network traffic to pass through a mid part before reaching the final destination. For example, send-to is like saying the final destination is on-premises or the internet, and send-via is saying but before reaching the final destination, go through this part. Usually, in the architecture that we're working with here, it's AWS Network Firewall and inspection Amazon VPC or a third-party solution for security. For example, you have an Amazon VPC that wants to reach the internet. It's like, okay, send to internet but send via AWS Network Firewall, or send to Amazon VPC but send via AWS Network Firewall. You can deploy the finish or the end of the network traffic, but also the mid part before reaching that final destination.

Thumbnail 1480

Thumbnail 1490

Back to you, Pablo. Yes, so give me a moment so I don't lose what I'm doing. You can see that I'm doing something that looks like a dynamic block. If you haven't worked with Terraform, this is just a way for me to create dynamically different blocks. Cloud WAN uses JSON, but in Terraform to make it easier to build, we use what we call a data source that we do in these specific blocks that then get translated to JSON.

Thumbnail 1510

Thumbnail 1520

Thumbnail 1530

Thumbnail 1540

To create the edge override, which is how I'm telling Cloud WAN between two regions what region I want to use, I defined that logic outside of the policy document definition, similar to creating a variable. I then create those blocks dynamically, so if I need to make any changes, I don't need to go to the policy document definition itself. I can go to my local variable, make the change, and then any time I deploy the policy document again, I will get that updated version.

Thumbnail 1550

Thumbnail 1560

Let me continue with the next block because we covered production to development and hybrid, and we missed development to hybrid. Now, the last thing I need to do is the attachment policy. I already have one attachment policy that gets the tag value of the attachment, and if the tag is inspection equal to true, that means it is an inspection VPC because the inspection VPCs have already connected to Cloud WAN. Now I'm going to create two more attachment policies, one based on the tag and another one based on the type of attachment. Give me a moment to finish that piece.

Thumbnail 1590

Thumbnail 1600

Thumbnail 1620

Thumbnail 1630

Thumbnail 1640

What you can see here is that Cloud WAN works with segments. These segments can span across regions, and we are having a development segment, a production segment, and a security segment with AWS Network Firewall. If you want to automate this journey further, Amazon Q can do this, or Kioko also has capabilities. The question here is how to automate this even more using generative AI. You can work with Kioko or you can work with Amazon Q Developer. Remember that Amazon Q Developer advises you on the next slide to write code. It also works with Talos Cloud.

Thumbnail 1650

Thumbnail 1660

Thumbnail 1670

The other thing that we are really excited about is something that was announced a couple of weeks ago. We now have advanced routing with AWS Cloud WAN. Advanced routing is a new function where, you know, adding many routes like 100 routes or 1000 routes is part of every network engineer's daily work. The idea with advanced routing is that you can now add filters, inbound filters, outbound filters. You can summarize routes and use BGP attributes and BGP metrics. This makes routing easier and simplifies your route tables. It's a fresh feature, so if you start implementing AWS Cloud WAN, you can also take advantage of this new capability.

Thumbnail 1720

Thumbnail 1740

We have already created the new version of the policy document. Let me go to the console so I can show you how that works. This is the AWS Network Manager console where Cloud WAN lives. If you have transit gateways, you can also take advantage of these network manager capabilities by importing the transit gateways. However, in Cloud WAN it works natively. You can see that my policy version is the 15th. We did a lot of dry runs before this session. I have two different kinds of aliases. When I'm making changes, I have my live policy, which is the one currently deployed without the new attachment policies, service insertion routing, and everything else. Then there's the new version that we just created, which will take some time to deploy. Once it's deployed, it will move to the live version.

Thumbnail 1790

It will take some time, so of course this might be a good time for some questions. We can also move to the next section if it takes longer because usually it can take longer if it's adding a new region or removing a region, which can take several minutes. Now that we're just creating some routing, it maybe doesn't take that much. You can see that what I have already created are my three inspection VPCs attached to the network function group inspection VPCs, so that piece is already there. Now we can just create those specific routing configurations.

Thumbnail 1820

Thumbnail 1860

Thumbnail 1870

Thumbnail 1880

Thumbnail 1890

Thumbnail 1900

Thumbnail 1930

Thumbnail 1960

Thumbnail 1980

Thumbnail 2010

Thumbnail 2090

Automating Routing Domain Enforcement with AWS Step Functions and EventBridge

Now we can create those specific routing rules. Everything is being deployed, so it's telling you which actions apply. The attachment policies are applied automatically. Now it's going to be creating all the static routes. Do we have any questions? Yes, I have one. If someone has permissions at the account level, can they bypass the service control policy? The answer is no. Service control policy will always be above or override the account level permissions that you have. Yes, exactly. There was a question over there. Did you define where all your inspection VPCs are located? It looked like you also told them whether you're inspecting on both the source and destination region or just one of them. Is that part of it? Can you repeat that? I need to understand if you are inspecting both source and destination in service insertion. When you say, let me come back to the definition. When you say where I have it here like dev to hybrid and also hybrid from to dev, it's bidirectional when you do it. So it's going to be between source and destination. Yeah, you're going from US. So that's where I define that. For example, between US-East-1 and EU-South-2, Spain, I'm saying that I only want inspection in US North Virginia. I can say by defining here dual hop that I want inspection in both, but in that case we're doing single hop for the sake of the example. Cool. Then while we create AWS Cloud WAN, I think we can move to the third requirement, and then of course we have everything connected together. So what have we created so far? Basically we have all these: we have the inspection between routing domains, we have the re-inspection, we have the cross region, and the attachment automation. Do you think that we have met all these requirements, or are we missing anything? I think we're missing something. Yes, so the thing is, we can do it. We can meet that one easily by simply saying to everyone, hey, put the OU tag when you connect your VPC or do it in the account factory that we mentioned before, doing it by default so they don't need to do it. But what if someone puts the wrong value of the tag domain? They may be connecting to hybrid. They may be connecting to prod. So still we need for our use case or requirement of ensuring that new VPCs go to the right segment. The way we define the policy document, we need a little bit of extra help to meet that requirement. That's how we're going to enforce routing domains from OU using AWS Organizations and AWS Step Functions. Just as we did with the VPCs, we're going to create another service control policy for the spoke OU and the spoke accounts so they won't be able to create the tag domain. It will be created from the networking account. They can create any tags in the attachment but not the tag domain because that's the one that defines which segment they're connecting to, and we want to do it from the networking account. Once we do that, it's going to be denied, so they're going to be creating the attachment to nowhere in Cloud WAN, so it's just an attachment without any association. Now what we're doing is as part of Network Manager, we have the capability to get events from changes in the network. AWS Network Manager events are sent to EventBridge. I'm showing Oregon, and I'm showing Oregon because the home region of AWS Cloud WAN where the control plane resides is Oregon. So all the metrics and the events are going to be populated into that region. EventBridge is going to catch those events. We're going to filter by only new VPC attachments. I don't want to get many events, only those ones, and I'm going to send them to Step Functions to do this logic. Wait for the VPC attachment creation, which is most likely just a measure to make sure that we don't start doing the logic before the attachment is created. I'm going to obtain the OU information from the account ID because that's metadata that I get from the event. I'm going to create the tag from the networking account.

We will see how after the attachment is created, it changes from no segment to the segment we want to be connecting. That's what we're going to do right now.

Thumbnail 2130

Thumbnail 2140

Thumbnail 2150

Perfect. So we had a question back there, and I'm going to move now to one. The thing here is yes, we are using AWS Cloud WAN to deploy the network, but Service Control Policies is the one that is helping us prevent someone from deploying things that we don't want as Paulo and I as the network specialist want to be deployed. So yes, as you saw at the beginning, there was only IPAM and Service Control Policies. There were no AWS Cloud WAN, just to show you that you cannot bypass Service Control Policy or deploy other resources that are not allowed. It's not because of IPAM or AWS Cloud WAN, it's because of Service Control Policies.

Thumbnail 2180

Thumbnail 2220

I think we have another question in here. What's the question? Yes, the testing. OK, so once we create the new policy for Cloud WAN, right, it's generated and we see it has a latest one and we have a current one. Is there a way to test connectivity before finally applying it and if it fails to go to previous version? So currently we don't have that option. When you create the new policy, you can check what is changing before you apply it. From Terraform or CloudFormation, it will do the approval automatically to not break the infrastructure code workflow. But if you do it from the console, you can check what it's doing and see if there's any destroy action.

Thumbnail 2270

Currently we don't have the way to test that before going live. What we have created recently is a Cloud WAN MCP that helps you with different tools to quickly check if the routing is happening once you have the policy document live. And then of course you can always roll back to previous options if you don't want to continue. There's also the Reachability Analyzer and Network Access Analyzer tools if it's not met. So those tools are the ones used, but then you need the policy already live. We're of course working on ways because feedback is also that you may want to check this before you have it created. So we're working on that, but for now it needs to be live. Then of course use tools to understand if the network is properly defined. But if not, you can quickly roll back to the previous version of the policy via API or via the console and then continue with your deployment.

Thumbnail 2320

Thumbnail 2330

So let me go quickly to the Service Control Policy that I'm going to create while I'm explaining everything. What I'm doing is denying the creation of any VPC attachment if there's this domain key, and of course denying any tag modifications. I'm not allowing anyone to change the domain tag once it is created. I'm doing that at the dev OU level. Now I'm going to be building the automation. I'm going to cheat on the automation side because I have almost created it. It's coded. I'm just going to uncomment it and we can discuss it while it's getting built.

Thumbnail 2360

Thumbnail 2370

Thumbnail 2380

I'm creating this from the networking account. OK, so the automation is what we mentioned before: the EventBridge rule that is capturing only VPC attachments and capturing the Network Manager events, but only the ones that are VPC attachment created. That's the only thing that's important right now, and I'm sending this to Step Functions. To see exactly what I'm creating, a Step Function is always better to check from the console. You have a better UI. And yes, if I go to my state machine to the definition, so it's here, let me make it bigger. That's what we're doing. We are getting the VPC attachment just to first check if the attachment is created. The majority of the time it's going to be created already, just as a kind of safety measure. And then I'm obtaining the organizational unit from the account ID and getting the domain from that OU. So in that case we're making it simpler because the OU name is the name of the segment in Cloud WAN.

Thumbnail 2430

Thumbnail 2440

But here, this step is maybe if you have extra logic—for example, if an OU means non-production—you can add extra logic in the step, and then we're creating the tag in Cloud WAN. Basically, step 3 and 4 allow us to get this extra logic into AWS Cloud WAN.

With that, we have built these requirements by applying the best practices that we mentioned before. So with AWS Organizations, what we're doing is not putting efforts on blocking actions that shouldn't happen at the network level by simply denying actions or putting a firewall when we need to, or just thinking that I don't trust anyone so I would control everything. What we're doing instead is creating guardrails in Organizations. So no one is going to put up a CIDR block that shouldn't be, and no one is going to put a gateway in their VPC. I don't need to worry if they do, or prevent them from doing specific actions. I know that they cannot put a gateway there. So go ahead and work with your VPC because it's your workload. You should also work with your own network for that.

Thumbnail 2520

Thumbnail 2550

With Cloud WAN, we built a global network in three AWS regions with less than 50 lines of code. VPCs will automatically connect and the routing will be created automatically. Then finally, we had a Step Functions state machine. We had an ad hoc requirement that needed some automation on top of that, and we used five states to ensure that VPCs are only being associated to their corresponding segment.

Live Demonstration: Deploying VPCs, VPN Connections, and Automated Segment Association

Now let's go and try it out. We're going to finish the definition of our three VPCs. We have a VPC in Paris that is going to mimic a data center, and we're going to do a VPN. We're going to see how the site-to-site VPN is going to get connected directly in the hybrid segment. Our spoke VPCs are going to first create it in the dev segment, and then our automation is going to kick in and move them to the domain routing domain. Then we're going to show how the automation works and get more details about the configuration on Cloud WAN before we finish the session.

Thumbnail 2580

Thumbnail 2590

Thumbnail 2600

Thumbnail 2610

Thumbnail 2620

Let me come back to the coding piece. In here, you can see that we have been working mostly with Amazon VPCs, but AWS Cloud WAN can work with Amazon VPCs, AWS Site-to-Site VPN, and AWS Direct Connect. It can pair with AWS Transit Gateway, and also with AWS Transit that we connect, which is the solution to work with SD-WAN. So here is that you can work with AWS Cloud WAN not only with Amazon VPC but also to help you deploy the routing and everything that you need for the hybrid connections, which is pretty usual for many of our customers.

Thumbnail 2640

Thumbnail 2650

Thumbnail 2660

Thumbnail 2670

Thumbnail 2680

Right now, what I'm doing as part of the VPC module is defining the way to create the core network attachment. I'm defining that this is my core network that was created, and that's why I have a local variable because I'm checking that ID or that ARN from AWS RAM. Here I'm putting just the routes. Basically, I'm sending any IPv4 traffic or any IPv6 traffic to the core network attachments. The last thing I need to do is create the subnet where I'm going to be placing that attachment, and I'm making it dual stack by assigning an IPv6 block. This is just part of the configuration in the module. We're saying that it doesn't require any acceptance because in Cloud WAN you can also put some controls about whether you want to accept the attachment before it automatically connects to a segment or not. You can add some manual approval. In our case, we didn't want to, which is why we built the automation to control that. But if you don't want to build that automation, you can add some manual control and just simply get notified when you have a new VPC to check it and maybe approve it.

Thumbnail 2710

I'm cheating a little bit because I have already set up my private module just to create EC2 instances and endpoints to connect to. I don't think we will have time today to show end-to-end connectivity, but just in case we have the time, we are building some EC2 instances.

Thumbnail 2730

Thumbnail 2740

Thumbnail 2750

Thumbnail 2760

So now I'm going to do the same in the rest of the environment. Any questions? Perfect. So that's on this spoke account. I already have everything. Let me make sure I did not mess up something.

Thumbnail 2780

Thumbnail 2790

Thumbnail 2800

Is it possible to set up custom error messages? I did something with SCPs. I did all of them blocked. Let me check for the users. Yes, because we've avoided using SCPs for tax enforcement out of fear that internal customers won't understand why. I'm just doing a split, so I need to take the second item which is the ID from the ARN. That's what I'm doing in this specific section. So now it should give me everything good.

Thumbnail 2820

Thumbnail 2830

Thumbnail 2840

What's the question there? We have the question of whether we can send a message to customers that they are getting blocked by Service Control Policy because they usually don't understand why they are getting blocked when they're trying to do something that Service Control Policy is not allowing because it was configured that way. I don't know if that's possible or not. Maybe we can take that later and do a little bit of quick research if it's possible or get you to the right person to answer that, but I don't know.

Thumbnail 2850

Thumbnail 2860

Then the last one is on the on-premises side. I have already created the VPN, so I have my customer gateway, my VPN connection, and the only thing I'm going to be creating is the site-to-site VPN attachment. So now once I create it, I can move and show how everything has been created and just see how the different automation is working and how everything is getting connected to the cloud. Let's turn the network inside so it's not creating anything. I know I need to be in the on-premises one, sorry.

Thumbnail 2910

Thumbnail 2930

So on the on-premises side, the VPN, even though I have another term from the states, I'm working kind of independently for the on-premises site. It's getting built in the networking account, and this is why the site-to-site VPN attachment needs to be created. The attachment on the VPN needs to be created in the same account where Cloud WAN resides. That's the same way it works on the transit gateway. So let's go to our console to see that now we have our VPN and the VPN already is getting associated to the hybrid segment thanks to the policy attachment policy configuration, which is the one we can show you now from the console so you can see how all the logic and everything works.

Now the VPCs for now they're just being created without any segment. We will come back to them and the attachment stuff in a moment to see how they moved. Look at, for example, the one in Spain already got created and it's moving already to the dev segment. So now it's just pending the network update of the association and the propagation in that specific segment. Before that, let's go and check the policy version and then we see how everything is working.

Thumbnail 2970

Thumbnail 2980

Thumbnail 3000

So even though we did a data source, Cloud WAN works with JSON. What we did or what it was already created was the ASN ranges. So any endpoint in Cloud WAN is using EBGP. So we need to provide the range of ASNs that we want in the regions. Then the edge locations in which regions we want to create Cloud WAN. In that case, three regions only. And then we have the attachment policies. I already explained this one when we were doing that part of the code.

Thumbnail 3010

Thumbnail 3050

The logic for VPCs was: if the attachment type is a VPC and the tag domain exists, I want to put you in the segment that has the same name as the value of the tag domain. Note that I did have a typo here that should have been "and" instead. We can update it later in a moment. With this attachment policy, I'm covering all the possible use cases for my VPCs. If I do that logic of OU name segment or your own logic that you want to do in Cloud WAN, the last one is: if any of these attachments are connected, such as Site-to-Site VPN connect attachment or Direct Connect gateway, I want you to put it in the hybrid segment so I make sure that anything that is hybrid goes to that routing domain.

Thumbnail 3070

Thumbnail 3080

Thumbnail 3120

I then configure my network function group for my inspection VPCs and my three segments: dev, hybrid, and prod. We didn't use prod today, but it's just to show you as an example of how you can create different segmentation in Cloud WAN. Lastly, we have the segment actions. As we were mentioning before, now we have routing policies to get you to do more advanced routing controls such as filtering and summarization. Advanced BGP capabilities like changing the AS path, changing the local preference, or changing BGP communities. In this case, we're showing you how you can insert firewalls. We have the send-to action, which creates a static route pointing to a default for egress traffic to the inspection. We also have the single-hop send-via between the different segments.

Thumbnail 3130

Thumbnail 3150

Thumbnail 3160

Thumbnail 3170

Thumbnail 3180

Here you see that even though in my policy document I created the dynamic block, when it translated to JSON it creates the override. So basically between these two regions I want to use the first one, and then everything that we define in the local variable. We have it both for prod to dev and hybrid, and then from dev to hybrid so we cover all the different use cases. Let's go to attachments. How much time do we have? Okay, we have some time, so we're good. You see now that the three VPCs already moved to that. Let's go to our state machine to see how and why that happened. We see three executions, which were our three VPCs moving and getting created, so getting the notification to EventBridge and coming to Step Functions. Let's pick one as an example to show you what happened.

Thumbnail 3190

Thumbnail 3210

Thumbnail 3230

Thumbnail 3240

Thumbnail 3260

Thumbnail 3280

Here what I did is obtain the VPC attachment and I was obtaining three things: three variables in my Step Functions. First of all, the AWS account from the input, getting the attachment ARN, and I'm getting the account where the VPC was created. The attachment ARN and then the attachment ID. First, the choices state is just checking. From here I'm getting the VPC attachment, so I'm also obtaining the status of the VPC attachment if it's available. So if it's created, I just move to the next state where I'm getting the organizational unit. From the account ID I'm getting the organizational unit, and in the other one I'm getting the name of the organizational unit. The only thing I'm doing now is tagging that attachment. The cool thing in Step Functions is I don't need to use Lambdas. Step Functions provide native integration to the majority of the AWS APIs. So I'm just directly calling the APIs for those services. I'm tagging the attachment with the dev domain equals dev. So if I come to my VPCs, I see that the tag is already there: domain dev. The cool thing is that if now I'm in the spoke account and if I go to Cloud WAN and I check my attachment because that was created in the spoke account, it is shared with me. Here I have it. I see my attachments that I created for the core network that they shared with me. I can also see those tags. Something cool is that when you do AWS RAM, the tags are not shared between accounts, but Cloud WAN is sharing those tags between accounts so you can see and potentially control them unless we block it as we did today.

You can also check the tag of that attachment so you understand better why my attachment was connected to the segment. But also, for example, because it's a shared resource, I don't know what I'm connected to. I'm just simply connected to Cloud WAN and I trust my networking team to have built all the routing for me.

Thumbnail 3340

Key Takeaways: Architect by Use Case and Embrace Infrastructure Automation

So I think with that we can start closing it up. First, yes, there is a repo you can start working with AWS Cloud WAN service control policy. Everything is a sample from the official repository where you can find these and many other solutions. You can also start working with AWS Cloud WAN specifically with guidance for attachment before you move.

In the Cloud WAN blueprints we already moved to version one, and you have even examples for this advanced routing with routing policies. Feel free to check it, create issues, create pull requests. We're really happy to get at least these blueprints where we can interact with you, and you can tell us you want to see that specific use case. The guidance solution is basically the same idea that we saw with Step Functions, but for production real environments or thinking on production real environments. So it helps you to create this extra logic into the attachment and the attachment management of Cloud WAN if you want to add those extra controls aside the attachment policy.

Thumbnail 3410

And then of course some examples on service control policies. We also have AWS Skill Builder. This is the platform where you can find many different courses for free. Some of them are free, and others need to be paid. We have specifically for network and content delivery where you can find more about AWS Cloud WAN and other networking services.

Thumbnail 3440

As I said at the beginning of the session today, here are some key takeaways. You can forget everything that we said so far. This is the important piece. The first one is architect by use case, not by habit. What we have seen so far with customers is that as they're growing, they are just putting one specific service because that's where they found it in the documentation or that's where they're comfortable with or their partner is comfortable with, and then everything needs to operate around this architecture decision.

We see success doing the opposite. What's your use case? That's why we're here. Tell us your use case and we can help you define which is the best service for your specific use case. And then of course the idea is that you don't have one service for everything. I know that you need to be in the middle, not like let's use all the portfolio of AWS networking to do connectivity, but for sure it's not only one. We need to define the use cases properly and make sure that we have the right pattern for the use case.

For that we need to start with business outcomes and then architect for agility. Understand where we want to get, where we want the business to reach and when they want to reach those goals, and then architect for agility. Infrastructures called modular designs, continuous iteration. Most likely you have really well-defined pipelines and automation in your applications. Take those learnings and put it also in your infrastructure so you can also have that iteration. Services like Cloud WAN help you to not have to control big infrastructure code files with so many resources. You have everything in one single place and they orchestrate beyond networking.

We see that not only networking is the key service. Service control policies, organizations, security measures, automation take full control of the portfolio of AWS also to build the infrastructure that you want to get to reach your business goals quickly and to the goals that you have been setting up. So with that, thank you very much.


; This article is entirely auto-generated using Amazon Bedrock.

Top comments (0)