🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.
Overview
📖 AWS re:Invent 2025 - Hybrid connectivity at scale: A deep dive into AWS Direct Connect (NET403)
In this video, Steve Seymour and Josh Dean provide a comprehensive deep dive into AWS Direct Connect, covering physical infrastructure including Direct Connect locations, port types (1G to 400G), cross-connects, and MACsec encryption. They explain logical infrastructure with three virtual interface types (public, private, transit), Direct Connect Gateway for multi-region connectivity, and BGP routing strategies using AS path prepending and local preference communities. The session includes practical troubleshooting tips like "rolling the fiber" for polarity issues, CloudWatch metrics for monitoring light levels and connection state, and failover testing. Cost calculation examples demonstrate pricing components including port hours, data transfer charges, and Transit Gateway fees. They also introduce the new Database Interconnect Multi-Cloud capability with Google Cloud and emphasize designing for resilience using maximum resiliency architecture with four connections across two locations.
; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.
Main Part
Introduction: How Hard Can Direct Connect Be?
Good afternoon everyone. Welcome to the session. This is NET403: Hybrid Connectivity at Scale, a deep dive into AWS Direct Connect. Thank you all for coming. I wanted to start with a very simple question that was on my mind when I joined AWS a number of years ago. When someone said, "Can you talk to some customers about Direct Connect?" my thought was, "How hard can it be?"
It's just VLANs and 802.1Q tagging, a bit of trunking, and some BGP knowledge about AS numbers, configuring filters and prefix lists on routers. On the AWS side, we need to configure things in the console and the router on the other end. So how hard could it really be? My name is Steve Seymour, and I'm the Worldwide Tech Leader for Networking and Solutions Architecture at AWS. I spoke about Direct Connect back in 2016, and I wanted to remind people that, as you can probably tell from my accent, I'm from the UK, where we say "router."
I did explain that back then, but it didn't quite sink in, so I tried again the next year to explain what a router really is. This year I've given up, so I asked someone else to join me. I'm Josh Dean, Sr Product Manager on the AWS Direct Connect team. I was also born in the UK, so I was a router guy, but now I'm a router guy, so I'm a convert. Hopefully between us, we've got people covered, whichever pronunciation you prefer.
Another question: who's used Direct Connect? A fair few of you. There are one or two that didn't put their hands up. I wanted to give you a bit of history here. At AWS re:Invent back in 2014, we actually used Direct Connect at the conference for a couple of demos and launches that year. If you came to re:Invent in 2015, it was actually being used to deliver most of the event network. If you went to anything on AWS in 2015, you would have been using Direct Connect.
Unsurprisingly, this year Direct Connect is a key part of the network infrastructure we deploy here for re:Invent, and it's been that way for the last couple of years. If you've used the Wi-Fi here at re:Invent and accessed anything on AWS, you have used Direct Connect. Perhaps that's everyone now in the room. So let's get started. When I say "how hard can it be," there is quite a bit more to Direct Connect, and we're going to walk you through everything in this session.
We'll cover physical infrastructure, looking at Direct Connect locations and the different interfaces we have, how to organize cross connects, and where partners fit into things. We'll look at the logical infrastructure, the various types of Virtual Interfaces, and of course we can't avoid BGP—it's going to be part of this session. Then there's the cloud infrastructure, the AWS side of it. That's the plan for this session.
Understanding Direct Connect: Physical Connections and Location Selection
What is Direct Connect? It is a physical connection between your infrastructure and the AWS infrastructure. I say physical connection because it does involve cross connects, connecting pieces of fiber between equipment to enable traffic to flow from your infrastructure into the AWS backbone and into our regions. Originally, it supported a couple of connections, perhaps in a location to one region, perhaps one VPC.
Direct Connect has evolved a lot over the years I've been talking about it. Direct Connect now supports connectivity to multiple regions using Direct Connect Gateway, and of course multiple VPCs as a result. It also supports connecting into things like Cloud WAN and Transit Gateways. Direct Connect has evolved over time as you would expect, but we're going to go right back to the beginning in terms of establishing these connections and that physical connectivity.
When you're establishing your first Direct Connect with AWS, one of the first decisions you need to make is where you're going to do it. As I like to say, it's a location, location, location discussion. You need to look on our website and check out the different locations we have around the world for Direct Connect. These are locations where we have infrastructure and can provide ports for you to connect to. When you look at that table on our website, we list out the locations, the associated region, and the set of features available at that particular location.
If you wanted a particular amount of bandwidth, you can look down that table and recognize that a particular location supports the bandwidth you need. We support 1 gigabit, 10 gigabit, 100 gigabit, and 400 gigabit ports. Not all locations have all of those different bandwidths, so check the list for which one makes sense for you. You're also going to be making a decision based on where your infrastructure is. It might be that you have your network infrastructure already in one of those locations, which makes the choice really quite easy. If however you're connecting to some on-premises infrastructure or another data center, then you'd be choosing the Direct Connect locations that are closer physically to that data center.
Because you're going to have to order the connectivity from our Direct Connect location to your infrastructure.
Now, creating a connection is really quite simple. Give it a name, choose the location from that list. There is an option at the bottom there which says you would like a MACsec enabled port. I'm going to talk about that a little bit more later on. But that's also listed in that table, which device, which locations support MACsec. So if you plan to use MACsec, this is the time you make that decision. It's not something you change later on. If you would like a port that is capable of MACsec, so encryption, choose it when you order the connection at the beginning.
You're going to notice throughout the slides here that we have these boxes down in the bottom corner. These are just to call out some of the quotas that we have around Direct Connect. Many of these you can just request an increase for. But in this example it's 10 connections per location per region. That's your starting point for ordering Direct Connect connections.
Establishing Physical Connectivity: Meet Me Rooms, Cross Connects, and LOA-CFA
So we've ordered our connection, we've clicked through on the console. Well, the next step, surely we just need to get ourselves connected. We just plug in, don't we? Well, the photo that I'm showing you here, this is an example of a meet me room. So this is Equinix TY2. As you can see, it's full of a lot of fiber, a lot of ports. This is actually a very tidy, neat meet me room. And this is where your port from AWS is presented for you to consume.
So when you order that connection, we then issue something called an LOA-CFA. So an LOA is a Letter of Authorization. It is your permission to plug into a particular port. And CFA is the Connecting Facility Assignment. It's basically where is that port. So this document contains everything you need to be able to plug into that Direct Connect port on our side. It's going to contain other information such as the connector type. So LC connector is typically what we would see. It's a single mode connection, the different standards that are required for the different bandwidths for our ports. So you've got all that information, so surely now you just plug in, yeah? No, that's not quite how it goes, because in reality, you personally as a customer, probably do not have access to that meet me room.
So in reality, you're going to have to work with the colocation provider now to arrange for something called a cross connect between that port that we've allocated and your infrastructure. So when we think about getting connected, unfortunately, there have to be multiple people involved in making this connection. It's not something that you can own totally yourself. There's always going to be at least three parties involved. There's obviously AWS on one side, providing the Direct Connect port. There's your side of things, your infrastructure, and in the middle, we've got this colocation provider that needs to make the cross connect.
The reality also is though, if you're connecting to a facility that is outside of that particular colocation facility, then you're probably going to have to work with some sort of telco partner, carrier, maybe even a separate last mile provider. And unfortunately, all of the people involved probably have their own processes, their own ways of doing testing, and I'll come onto this later on. Perhaps their own polarity on their ports causes some interesting challenges.
I've been doing this for quite a few years. I've worked on a bunch of customer projects and internal projects involving Direct Connect. And I thought I'd just share a couple of tips from my experiences. And the first one is, I would say, always draw out a really simple diagram of what you're trying to achieve with Direct Connect. Put AWS on one side, put your infrastructure on the other, and then work out all of the pieces involved in the middle. Doesn't have to be fancy, just a very simple diagram that shows all the locations, perhaps all the devices that you're aware of that these connections are going to go through.
On that diagram, identify who owns each component, because they will all have different owners. And then every port ID, every circuit number, everything you see written in an email or said to you over the phone, just add it onto that diagram, and this will become your reference for everything to do with that particular Direct Connect. I also find it quite useful just to note on there things like support details, contact details for the people involved, because this, as I say, just becomes your reference going forwards.
So how do we put all the pieces together here? So what I wanted to start with was the example where you've got infrastructure in your own facility, perhaps your own data center. It's not within the Direct Connect location, so in my example earlier, the Equinix TY2 facility. And we want to establish this connection from an AWS Direct Connect port all the way through to your equipment. Now it'd be really nice if you could just fire off an email that says, hey, partner, could you just make this happen? Here's the LOA-CFA from AWS, here's my equipment, could you just connect the two together?
I'm not going to lie, there are some cases where you can do that. There are some excellent partners who have all of the ability to deliver this end to end. But unfortunately, the reality is there are usually many more components involved. Let's stick with the example for a moment where we've asked a partner to deliver it. In which case, they simply build all these connections. Perhaps they arrange the last mile from your data center to a point of presence in your city for their infrastructure. Then they configure across the backbone that they have over their optical network into a location, and they arrange it all. They plug it in, ports come up on your router. Happy days. As I say, the reality is a little bit different.
Now, the infrastructure I mentioned there is in place typically, so there will be points of presence in your city. There will be infrastructure in these colocation facilities from the various telcos and carriers. Also, there is the meet me room infrastructure that I mentioned a moment ago. AWS's equipment is already pre-cabled from our devices directly into that meet me room, into what you saw in the photo earlier on. And the chances are that the large carriers and telcos in these facilities are also pre-cabled to the meet me room. So what that means is we still have a few gaps, but it's not quite as bad as I showed a moment ago.
To make all these connections happen, you can imagine there's going to be a lot of communication going on. There's going to be the request from the owner of the telecoms equipment to request the cross connect in the first place. There's going to be provisioning of ports and circuits across the infrastructure. There's probably going to be some ordering of running fiber across a town. There might even be involvement of logistics to dig up the road to run fiber. All sorts of things could be involved.
But when it comes down to it, if you break down that simple diagram and do it in stages, the first piece is that cross connect in the meet me room. So in this case, the telco has had to ask Equinix in our example to make that cross connect. The telco has then had to make physical connections between patch panels in their rack and their equipment, configure circuits over their network, deliver it to your local city, perhaps dig up some streets, and then deliver it into your location presented on a port ready to go into your router. And at that point, we might have a connection.
The other variation on this is if you are located inside that colocation facility already. Now if you're in there, things are clearly a lot simpler. You don't have any of that external telecoms work to do. So you'll have your rack, your equipment, and chances are you're also pre-cabled to the meet me room. If you're not familiar with that experience, typically the first time you order any sort of cross connect in many of these facilities, they will install a whole set of fiber between your rack and the meet me room.
What that means is you can now interact directly with that location provider. So it really is as simple as then sending the email that says, please can you make this cross connect. And of course it might not be an email, different providers use portals, you might upload the details, provide the document we give you as evidence. But effectively you just order that cross connect and it's done. I've said cross connect a couple of times. A cross connect really is just a piece of fiber being plugged into one port on one side and another port on the other. There is nothing more to it. I regularly get asked what is this cross connect thing and there is, it's very simple, it's just a piece of fiber.
So at that point, our connection comes up and hopefully the thing you see in the AWS console is that your port goes from being down to now showing as available, the link comes up on your router. We're in a good place. But let's just go and verify that everything is as we expect. So we have CloudWatch monitoring for Direct Connect. There's a whole bunch of the metrics that you would expect to see there, bandwidth usage, and so on. But I wanted to just draw your attention to the connection state, which is a very simple metric you can look at that indicates is the connection up or down. If it's one, it's up. Anything else, it's down.
The other interesting ones to look at here are the light levels. Particularly the RX, so the received light level. And you can see on there I've made a note of the ranges that are considered good. Effectively if you get down to minus 12 dBm you're okay, but you could be better. Start getting beyond that, this connection probably isn't coming up. And the reason I'm mentioning this is you can have a situation where everything is plugged in and connected. But you haven't got a link yet, and it's good to come and have a look in the console and actually see what the light level is, because it may well all be connected absolutely fine, but you might have some dirty fiber somewhere that wasn't cleaned when it was connected or something like that, that's attenuating the light, that means we've not got enough signal getting through. So always go take a look at the CloudWatch metrics.
Dedicated Connections, Hosted Connections, and Link Aggregation Groups
Everything I've been talking about so far is what we refer to as a dedicated connection. And it's exactly what it sounds like. This is a dedicated port on a piece of AWS equipment that is assigned to you. You requested it and we allocated it to you. You get full use of that port and the
bandwidth available on it. It will support multiple virtual interfaces, which we're going to explain a little bit more in a moment. But that's effectively multiple VLANs and BGP sessions because you're in control of everything about that port. Fundamentally, it is physical connectivity that you're responsible for to get to the port that we've allocated to you.
Hosted connections are a little bit different. What happens here is we allocate connections to one of our Direct Connect partners. And we actually call them interconnects at that point, you may see that term around. Our partners have already taken care of all that physical connectivity between their infrastructure and ours. What they then do is allocate smaller amounts of bandwidth on that connectivity to you and provide it to you perhaps over an existing fabric or service they might have with you as a customer.
In that case, our partners can increase their connectivity with us by combining connections together. They can create LAGs, just like you can, and we'll explain that in a moment. But they can increase that capacity. However, we don't let partners oversubscribe those ports. So as they allocate bandwidth to you, that bandwidth is passed on to you and they have to expand their connectivity with us over time.
The thing about a hosted connection is you can only have one virtual interface per hosted connection. So it's quite common for our customers to have multiple hosted connections if they need them for different workloads. Perhaps some for virtual private gateways and some for transit gateways, and so on.
I mentioned LAGs, Link Aggregation Groups. What a Link Aggregation Group is, is it allows you to combine multiple physical connections into one single thing that is treated as a dedicated connection. So you might combine together four 10 gig connections. So you have a resulting 40 gig of capacity there, but you treat it as a single connection when you're configuring virtual interfaces on it. We'll then do flow hashing of the traffic across those connections, and you can configure on the AWS side and your equipment what is the appropriate point that you would rather shut down that set of connections if you lose capacity versus maintaining it.
So you might have 40 GB of connectivity and you decide that you really need to always have 20 GB available. So if one of those other connections is impacted, you're down to 30 gig, that's fine, keep the LAG up. But if we drop down to 20 gig or below, perhaps we then shut it down, and that's configured as the minimum number of interfaces on a LAG.
Troubleshooting Physical Layer Issues: Rolling Fiber and Loop Testing
So we're in a fairly good place at this point. We've assumed everything's working, but what if it doesn't? What I thought I'd do is just talk you through a couple of situations that we hear about all the time, and our support engineers end up discussing with customers. So I wanted to just give you maybe a bit of an insight into two things that we hear very regularly. And that other phrase is "roll the fiber," or I'm going to put a loop on that. I see a few people nodding here, maybe you've heard that or experienced it.
For those of you that have not worked with physical fiber equipment, at the bottom there we've got what's called an SFP, a small form factor pluggable module. And on the right-hand side we have some fiber, duplex fiber with LC connectors on the end of it. And the module at the bottom is what is sitting in those various pieces of network equipment on each end of this connection. And you might be able to see in very small writing on there, it's marked TX and RX. So that's transmit and receive.
So what happens there on the transmit side is there is a laser inside that module that is firing the light out of it, and when it gets to the other end, it needs to go into the received side of the other connection. So just to illustrate that point a little bit, this would be just two devices next to each other with a single duplex fiber cable that you would just plug in. This is what would happen. Transmit would be connected to receive and vice versa. We'd all be having a good day, packets would be flowing, no problem.
Well, when you put this in the context of a colocation facility, we've now introduced a meet me room. Now there are no actual standards for how to do this kind of cabling. There's some best practices that have been adopted in terms of where you cross over those pairs. So typically between your rack and the meet me room, the fiber will have been crossed over. And if we take that simple example on the screen there, you've got two sides of a connection here, a single meet me room, the fibers have been crossed on one side, crossed on the other. The patch in the middle is just a simple fiber between those ports. That unravels quite nicely and you end up with transmit to receive and vice versa again. All good.
The other variation though is when you start adding in multiple meet me rooms, other providers, other pieces of equipment, and before you know it, you've got a situation where these fibers have been crossed multiple times. So when you hear the phrase "roll the fiber," what they literally mean is swap transmit and receive around. I've lost track of the number of times I've visited racks where I've seen the little clips on an LC fiber connector broken off so that the fibers can be swapped around. It's a very common thing. I wish it was totally standardized everywhere, but unfortunately it's not. So that's what that means.
When you are debugging a connection, the other phrase you might hear is putting a loop on it. What I am showing you here, very simply again, these are just two SFPs, two devices that are connected via a meet me room, and perhaps the connection is not coming up. We want to identify where the potential problem is. Do we have some broken fiber? What is the issue?
What will happen here is you will hear someone say, I am going to put a loop on that connection facing towards AWS or facing towards this carrier. What that means is in that meet me room, they might unplug that fiber, that cross connect that we had, and quite simply they will just loop the two together on one side. On the AWS side, our laser is firing light out, it gets to that port in the meet me room, it gets turned right back around, sent back to AWS. Now in those CloudWatch metrics, you would see light level being received. It is not always true that the connection would come up as such, but you can see that that loop has been made.
Even better, you could see the transition where there was no loop at all, no light flowing. You loop it, it is good. So now you know the path from AWS to the meet me room is good. The same could be done on the other side with the carrier, and it could be done at various places all the way down the path to troubleshoot a problem like this. These are the two phrases you are likely to hear. I thought I would share it with you because our support organization works with customers regularly on these things.
MACsec Encryption for Layer 2 Security
Quite often we find customers do not necessarily understand these concepts, which I get. Hopefully this clarifies that process. The last thing I wanted to talk about in terms of physical connectivity is MACsec. MACsec is enabled at the port level, as I mentioned earlier. When you request your port, you check that box at the bottom, and if it is available for that bandwidth at that location, it is now available for you to consume.
MACsec encrypts the traffic between two layer 2 devices. If you have a piece of equipment in that colocation facility and you connect them together via the cross connects, no problem. If you are connecting outside that facility to perhaps your data center, as long as your carrier is passing through all the layer 2 traffic, then you can establish a MACsec connection between those two devices. Once that traffic gets into AWS, we encrypt all of the traffic between AWS secured facilities. That MACsec session is between your device and the Direct Connect device in that location.
The keys for this session are generated by you. They are stored inside Secrets Manager, and you can then rotate these if you need to over time. You have a choice when you configure MACsec as well, in terms of what you want to do with the encryption. You can choose no encryption, maybe that is your starting point. If this was part of a plan that you were going to enable MACsec later on, you would set it to no encrypt.
There is also then should encrypt, where the two devices will try and establish an encrypted connection between them, your device to ours, but if it fails, the connection will still come up. That is quite a common stage to go to as you are rolling out MACsec connectivity. You enable it, you set it to just should connect. That now effectively lets you test it because if you get it right, the connection will come up and you will see it encrypted. If it does not work, you have not interrupted the connection.
And then the final one is must encrypt, which is exactly what it sounds like. If we cannot establish the encryption between the two devices, the connection will stay down and all your virtual interfaces will be down as well. So talking of virtual interfaces, I am going to hand over to Josh to talk about those with you.
Virtual Interfaces: Public, Private, Transit, and Monitoring
All right, thank you, Steve. Yeah, so just taking a look at the virtual interfaces here just really quick just to level set, Steve kind of alluded to it, but a virtual interface is the VLAN and the BGP, right? I know you kind of talked down about BGP, but I like it. We have three main types, and I am going to go into some details on them, so do not panic. Public, private, and transit. Public is for connecting to public AWS endpoints, S3, that sort of thing. Private for getting to VPCs, transit for getting to transit gateways and Cloud WAN, and then to VPCs.
And then we also have the concept of a hosted virtual interface, not to be confused with the hosted connection. This is a virtual interface owned by another account's connection. They configure all the BGP and all of that stuff, and then you attach it to your VPC essentially. Sometimes that is like an infrastructure account or something like that. We will start with public virtual interfaces. I am not going to go into super detail on these because I am kind of hoping you know at least some level of understanding about these, but public virtual interface, these are, as I said, to connect you to any region. These use public IPs for peering. We have some ability to scope these down. I will explain that a little later on. But essentially what it is is your own personal connection to the public side of AWS.
To talk about private virtual interfaces, we have to go back in time first. We have to go before 2017, which feels like the Wayback Machine. Back then, Direct Connect was a regional service, so what you had was your Direct Connect in a region and you could connect to your VPCs that were in that region. In my example here, I have US East One, and I have my VPC in US East One and my connection is in US East One. I can still connect to multiple VPCs, but one VPC per virtual interface, and this is where that concept of an associated region comes in, which you may have seen if you've used Direct Connect at all.
You can still use this architecture. Then we moved on to this architecture using Direct Connect Gateway. These are still private virtual interfaces, but what Direct Connect Gateway does is allow you to bring all your private virtual interfaces into one Direct Connect Gateway. Those can be global from anywhere, and you can then attach your VPCs anywhere as well. So you can go from anywhere to anywhere, from on-premises to VPC, VPC to on-premises.
Then we started to get into the transit gateway and Cloud WAN world. You've probably come across those services. We introduced transit virtual interfaces, which are very similar. You connect regionally, you can connect to transit gateways and Cloud WAN core networks in different regions, and add those on. One thing to note there is that you cannot mix and match. You can't add VGWs and TGWs, and you can't add transit VIFs and private VIFs to the same ones. That's just something to keep in mind.
And then just one other thing I wanted to point out is not necessarily a VIF type, but a setting on a VIF, a private or transit VIF at least, called SiteLink. This allows you to take your traffic from one Direct Connect location to another Direct Connect location and ride the AWS backbone. The nice thing is you can just click the button and you'll start seeing the route show up on the other side. So you can use it as needed. You don't even have to keep it on all the time.
Taking a quick look at monitoring, these build on the things that were discussed before, but for virtual interfaces we also have bits per second ingress and egress and packets per second. This gives you an idea of what traffic is going over those connections. We also have an offering under CloudWatch called CloudWatch Network Synthetic Monitor that's actually sending real traffic. You configure an endpoint in your subnets and you give us an endpoint on-premises, and we send ICMP or TCP traffic, whichever you choose. We return round trip time and packet loss metrics to you. There's also a network health indicator metric that comes back. What that is, is us looking at all of our internal metrics for everything from the subnet all the way to the Direct Connect endpoint that you're connected to, just to give you a quick look if you're having a network issue to say whether that's definitely an AWS problem that you can report to them, or if you need to go investigate your network.
BGP Routing Configuration: Filtering, Communities, and Traffic Engineering
For troubleshooting virtual interfaces, Layer One should be fine because you've been through the physical layer presentation, but that's the first thing to check. If your Layer One is not up, you're not going to get any further. The next thing to check if your VIF is not coming up is Layer Two. Make sure your VLAN tags match on both sides. The other thing we see pretty commonly, especially when you start adding in partners and providers, is that they love to use encapsulation technologies. That's totally fine, but we need to make sure that once they are facing the port facing AWS, those are stripped off and it's just the VLAN of your virtual interface.
Then once you get up another layer, you're into BGP Layer Three and Layer Four issues. You can see it goes two different ways. If you're seeing in your BGP logs stuff getting stuck in Idle, Connect, or Active, it's probably a network connectivity problem. If you're getting past that and seeing Open Sent or Open Confirm, it's a BGP parameter problem more than likely.
I want to talk about routing because there's a lot to it. First, I want to bring up the limits, and some limits are more important than others. They come in at different places. On the screen, I have virtual interfaces per connection and transit virtual interfaces per connection. Those are important, but they only really matter when you're architecting your network. The next two—routes per peer—those can take down your network or take down your Direct Connect if you exceed them at any point.
You really want to make sure you have a good understanding of your routes and are summarizing and keeping those under control. I'll give you some ideas on how you can do that. First, here's an example of route filtering. I'm filtering for a couple of /24s and applying those as a route map to my peer, sending just those routes to AWS. Another option is using summarization. I'm using the aggregate address command. Basically, what it's saying is I have those /24s, ignore those, and send a /16 only—just the summary. Then at the bottom, I'm injecting a default route using default originate. The idea there is to send a default route to AWS and then use the VPC route tables and Transit Gateway route tables to control exactly which traffic you're sending.
How do you configure the routing—active passive, passive, active active, that sort of thing? That's the next step. This is the mnemonic device I was always taught. I hope some of you have used that too. I still don't understand why there are two oranges. I find that confusing, but I've never used origin or origin type to decide a route, so it's fine. Here's an example. I have an Ashburn, Virginia Direct Connect to New York Direct Connect. My VPC is in us-east-1, and my data center is in Virginia. I want to prefer the Ashburn Direct Connect. How can I do that? One way is AS path prepending, which is pretty common in the industry.
The other way we could do it with AWS is using local preference communities. We have this set of communities that allow you to influence AWS's local preference essentially. Then obviously you want to also apply the same metric locally onto the routes you're receiving from AWS just to keep things symmetric. Just so you can see what those routes would look like, the top two routes are the preferred routes—the shorter AS path and the higher local preference community. All right, what if you bought all this bandwidth and you want to use it all? Why not? You want to go active active. I have two connections now in that same data center. How can I utilize everything?
Well, I could just use the same local preference communities on both. Technically, I could also use the same AS path. I personally like to use the local preference communities because I feel like, since it's higher up the list of the decision tree, it makes me feel a little more comfortable. But you could also do this with AS path prepending. Local preference is my choice. This way, we'll ECMP traffic across both, so obviously you would do the same on your side, but that way we're splitting traffic across both connections. For resiliency purposes, maybe now I'm going back to my two data center scenario where I'm adding a second data center. Now I have a data center in New York, but my data center is still in Virginia. I probably don't want an ECMP across everything because I'm going to have some weird latency problems and things are going to start arriving out of order.
So in that case, generally I'm going to apply my local preference communities on the closest one. That would be Ashburn since that's the closest. Then kind of a lower priority on the New York data center. Then obviously if things fail, we can ECMP on the other one, and everything's fine. Now I've never really come across a network that's that clean. They're usually a little messier than that. Obviously, we only have three local preference communities. So what do you do if you have four routes or more?
If you have four or more different paths and want to use them in order, here's what I recommend. For that use case, I would equal out the local preferences everywhere and then use AS path prepending. You can get quite granular with this approach and handle more than four paths, but that's how I would handle it.
Let me spend a minute on public VIFs. I mentioned scoping communities earlier, and these allow you to filter the routes from AWS that you receive. You can accept routes for either the local region, the local continent, or accept everything globally. Likewise, you can tell AWS to advertise your route into the local region, the local continent, or globally.
If you advertise the same route with an ISP and over Direct Connect, we're going to prefer Direct Connect unless you advertise something more specific. We don't apply local preference communities here. You can use AS path prepending as long as you're using a public ASN. If you use a private ASN, we'll strip it off and use a public one internally. ECMP gets a bit trickier with public VIFs, and I would generally say that within the same location you can do ECMP as long as everything else is the same.
Let me show you what the routes look like coming from AWS. First, you'll notice that we apply an ASN, we do a bit of prepending, and we apply the no export community. That's really just to protect us in case you were to leak the routes out to an ISP or something. Obviously your ISP should stop you, but just in case they didn't, we've got a little protection. I've seen customers strip off the no export, advertise them internally, and that's totally fine. It's just a bit of accidental protection. You'll see at the bottom the 8100, which indicates that this route originated from the region where your Direct Connect is located.
Failover Testing, SLA Models, and Performance Optimization
Failover testing is super important. Once you get your setup going, you're going to have maintenance, and things are going to happen. Somewhere along that long complicated chain of partners and providers, something's going to go down, so you want to know that you're good. We have a failover testing tool that lets you bring down BGP. You can set a time for that, up to 72 hours, which seems like a lot, but you could set it to be down for 72 hours if it all goes wrong. Thankfully you can't cancel it.
Let me note our SLA for Direct Connect as well. We have three models for our SLA. The first one is a single connection deployment, which is pretty much what it says. You have one connection. The next one is a high resilient multi-site non-redundant connection. It's a mouthful, but what it means essentially is that you're at two locations, but you only have a single connection at those locations. Finally, we have the maximum resiliency multi-site redundant deployment, which is at least four connections, two at each site and on two different AWS devices. Those are for you if you have workloads that can't have downtime.
Expanding on encryption over Direct Connect, there's another option we see besides MACsec, and that's using IPsec. In that case, Direct Connect becomes the transport underneath, and then you apply your IPsec tunnel over the top from the Virtual Private Gateway to your own firewall or router. You can do this with public or private IPs. It used to be problematic because you'd be limited to a tunnel size of about 1.25 gigabits per tunnel. They recently upped that to 5 gigabits per tunnel, so that really helps. It becomes a much more viable option.
On the subject of performance, just keep in mind that you can send as much bandwidth as your EC2 instance can handle. That's totally fine,
but keep in mind if you're doing benchmark testing or something like that, a single flow, like if you do an IPerf or something like that, a single flow you can only get 5 gigs. So make sure you're spreading it out a little bit. And then just make sure you make the most of your connections. So if you can support jumbo frames on private and transit VIFs, do that, because you'll get the most out of it. Public VIFs, we only support 1500, but you still get the idea. There's a pretty good blog post there with some good tips on how to improve your performance.
Understanding Direct Connect Costs: Pricing Components and Calculation
OK, so we've got our connection up and running. We probably want to understand how much this is going to cost. This is a question that comes up quite regularly in customer discussions that I have. So what I thought I would do is just put together a couple of examples so you can consider the components that are involved when you're working out how much Direct Connect costs you. In our example, we're going to have a very simple setup to start with. So we have our EC2 instance running in a VPC. In this example, it's in US West 2. We have a Direct Connect location which I've chosen to be in London. And I have my customer location on premises. This is how they're connected together. Obviously this is a non-redundant setup, thinking back to what Josh was telling you about.
So what do we need to consider? Well, when you're looking at pricing of anything to do with data transfer inside AWS you need to consider it in both directions. So in the first instance, we're going to consider data transfer between AWS going towards a customer network. What are the costs that we need to consider in this particular scenario? Well, perhaps the most obvious one is the Direct Connect port itself. So there is a charge for the Direct Connect port. It's charged per hour and it's charged from when the connection comes up for the first time or 90 days have elapsed from the point that you ordered the connection.
We recognize that when you click order, it's very unlikely that you're going to be able to plug in immediately and benefit from that. There's going to be time involved to arrange all those cross connects, those circuits, everything I talked about earlier on. But once that connection comes up and is stable, we'll start billing at an hourly rate for that port. And you'll get that charge now until you completely delete that port. So that's an important thing to remember, and it's consistent through each of the architectures I'm going to show you.
The next piece is the data transfer itself from the EC2 instance, leaving AWS going towards your infrastructure. And when you look at our pricing pages, that's where you see the wording data transfer from AWS region or local zone. And it's charged per gigabyte transferred. The way this works is you go and look at the pricing table, you identify where is the traffic coming from. So in this case from the contiguous US for example, and it's going to Europe, you can then work out how much that's going to cost you per gigabyte transferred. So the table for that is on the Direct Connect pricing page. In fact, that's where you can find all of the answers for these examples.
So this is, as I say, a simple setup with Direct Connect Gateway and a Virtual Private Gateway. What about data transfer into AWS in this architecture? Well, in this case, we still have the Direct Connect port fee. Obviously it's not being charged twice in this situation. But that's still a constant. But for data transfer in, in this situation where we're going via a Virtual Private Gateway, we don't charge you for any data transfer inbound in this scenario.
Let's look at a variation of that. What about with Transit Gateway? So we've still got our Direct Connect Gateway, we've now got though the Transit Gateway on that path, and perhaps you've got many hundreds or thousands of VPCs connected to that Transit Gateway. So once again, let's look at it from the perspective of data transfer out from AWS to your infrastructure. We have the port hour charge there, that's still the same. We also have the data transfer charge. That's still the same as well. No difference there, we're moving data across the AWS infrastructure.
But the addition of the Transit Gateway does mean that we now have to consider a per hour attachment charge, and we've got two attachments, one towards Direct Connect, one towards the VPC. And then we have a data processing charge for the Transit Gateway. So in this scenario, they're the components you would add together to calculate how much this particular workload would be costing you.
Let's look at it from the other perspective, data transfer into AWS with this architecture. Still got the same components, Transit Gateway sitting there, so our old friend, the port hour charge, still there. Data transfer coming into AWS, there's no charge for that. But we do still have the Transit Gateway on that path. So what that means is we still have the attachment charge, and again this is not being charged twice, it's the same attachments we talked about.
There is also a data processing charge for the traffic that is moving through that transit gateway. I mention both of these scenarios because you could have a scenario where it's useful to use the virtual private gateway architecture for certain workloads and perhaps a transit gateway or Cloud WAN based architecture for another.
Josh mentioned SiteLink, so I wanted to just cover off how we bill for that as well. In this situation, I've added another Direct Connect location in Las Vegas, and therefore I now have expanded to have a US office for this organization. The EC2 infrastructure is still there. I've grayed it out because the billing for that is exactly what I've been talking about so far. But for SiteLink, there are a couple of other things to consider.
When you create the virtual interfaces that you're going to use for SiteLink, you're going to attach those to a Direct Connect gateway. We've still got the port hour charges. They're still there. We create the virtual interfaces and we check that box that Josh showed you. At that point, we've now enabled SiteLink, and there is a charge for enabling that capability on each of those virtual interfaces. Once you've done that, the traffic is now able to flow between my UK office location and my US office location, going through the Direct Connect infrastructure.
To work out how much that costs, you need to go and look up a table that we have on the Direct Connect website. For traffic flowing from the UK to the US, you would go and look up in the table and identify this traffic is coming from Europe to the US. And then vice versa, traffic the other way, you'd go and look up the table in the other direction. As you can see in this example, it's actually the same charge here, but that's how you calculate the pricing involved for SiteLink.
Now, there are a few components involved in there, and I wanted to share something that we published a few months back, which was the open source networking focused calculator. We have the AWS pricing calculator. I'm sure you've all seen that. But we actually have this tool that we put out there for you to be able to deploy and use yourselves to help calculate anything to do with networking on AWS. It pulls from our pricing APIs, and I just wanted to mention it. There's a blog post all about it. So if you're trying to calculate how much these architectures cost in advance, you may want to deploy the pricing calculator tool that we've published there.
Database Interconnect Multi-Cloud and Key Takeaways
You may have seen over the last few days that we announced this new capability called Database Interconnect Multi-Cloud in partnership with Google, in preview. I just wanted to spend a quick second on it. I think it's super cool personally. What it does is allow you to create capacity to another cloud, right, to GCP in this case. What we're doing is building in our maximum resiliency model, so the four wide connectivity that I was talking about before. It's all pre-cabled, it's all pre-built. All you have to do is go and say I want 1 gig, I want 50 megs, whatever you want. You can get that from us through this new offering.
GCP is available for now. I think Rob Kennedy said in his innovation talk today, first half of 2026 for Azure to be added to this as well. There was a talk earlier today, so unfortunately you missed that, but a deep dive onto this, I think it was NET 205. If you want to go look that up on YouTube, probably in the next day or so it'll pop up there if you want to get more information on that.
All right, and just to wrap up some things that we want you to take away and remember. First of all, just go all the way through the layers as you configure and troubleshoot. Start at layer one and work your way up. Use the right VIF for the right job. A lot of times you can't use the wrong VIF for the wrong job, but it is good to consider that. Get a good idea of what you're going to do with BGP. Are you going to need to do some summarization? What do you need to do with the communities? Do you need to configure active-active or active-passive? What are you thinking there?
You want to design for resilience, right? As you're building, are you trying to meet a particular SLA? Is this a high availability workload or is it just testing and you don't care? And then obviously test and test and test and make sure everything fails over as it should. Understand how to calculate your costs. I think that open source solution is a really great way to do that. Or you can go watch this on YouTube and rewind and watch the last five minutes again.
And then just embrace your capabilities. If you're using Direct Connect directly to VGWs, try out Direct Connect gateway. It's super easy, doesn't add any cost, doesn't add any latency, and really just expands your portfolio. Try out transit gateways, try out SiteLink, try out Cloud WAN, try out interconnect. Yeah, thank you very much.
; This article is entirely auto-generated using Amazon Bedrock.



























































































Top comments (0)