Expectations vs Reality
Picture the scene: you're a budding Cloud Architect. You've completed a few AWS certifications and recently passed your Advanced Networking Specialty. At this point, you know AWS Cloud Networking and the best practices that come with it. You're then assigned to assist a customer with their AWS environment. In the excitement leading up to starting work, you envision large-scale Transit Gateway networks, funky BGP routing, multiple Direct Connect connections with site-to-site VPNs as a backup.
The reality, when you get onboard, couldn't be further from the truth. In fact, it is often a "WTF" moment. This is not uncommon, and a customer's AWS Cloud Networking design/deployment will more often than not be dictated by how far along they are in their journey with AWS. It's sometimes easy to forget that Cloud deployments are not like traditional on-prem deployments. There likely wasn't a huge program of work to design and deploy an enterprise-scale environment in the Cloud. There are exceptions, of course, but it might surprise you that a lot of companies' foray into AWS was setting up a single account and using a company credit card to pay the bill.
In this blog, I want to explore how a customer's hybrid networking environment may have evolved over time as they became more mature in their deployment in AWS. This is by no means what always happens but how things could evolve, at least in my head.
Yay lets use AWS..
So let's take the first step and look at what an environment might look like when we are starting out.
Above is about as simple as things get in terms of hosting an app in AWS: a single account, a single VPC with an Internet Gateway, some subnets to host instances, and a VPN Gateway to give connectivity back to on-prem to allow the app to talk back to other systems. As it stands, to host a single app, this architecture and hybrid connectivity would suffice despite what some might try to tell you.
Okay, so this is our base. Now let's evolve the solution.
WooHoo we love AWS lets expand..
After successfully using AWS for a period, our fictitious organization has decided they want to expand their AWS environment. They want a few more Prod VPCs and have also decided to aid with putting apps in the Cloud by adding staging and testing VPCs.
Now we have added the additional Prod VPCs and a Staging and Testing VPC. On the face of it, this is not too bad—still manageable but not the most elegant of solutions. It would still work, but as a former Network Architect, I would be raising alarms right about now about the future manageability of this environment.
Okay, so let's push the network admins to breaking point. These apps are now deemed critical and need resilient connectivity and resiliency across regions...
Weerrrrpp Weeeerrrppp WARNING WARNING!!!
The AWS deployment has been deemed so successful that the apps that run it are now critical to the organization. We need to evolve this to provide resilient hybrid connectivity and region resiliency... God help us!
And here it is in all its glory... Honestly, it looks like Peter Parker designed this. I think we can all agree this is now beyond manageable as a network service. So how do we solve this? Well, in this instance, we would now need to look at transit architectures. In all honesty, once you get to AWS holding critical apps, it's time to move to Direct Connect also. So let's morph this into a manageable network service again.
Ah... Thats better
Wow, what a masterpiece! We have introduced Direct Connect to give us a reliable, stable connection with bandwidth speeds based on requirements. This is linked into a Direct Connect Gateway, which makes the Direct Connect connections available across regions. We have then deployed a Network VPC with a Transit Gateway for each region, as they are only regional resources. Finally, each VPC has a Transit Gateway Attachment to allow it to have access to the Direct Connects and back to the on-prem systems (subject to the correct routing, of course).
This architecture would allow the organization to grow steadily with a low overhead to manage as it progresses.
However, there are some further additions and modifications that could be considered to further improve things. Let's look at some of those, as I am sure there are some veterans shouting at their screen that there are better options... lol
What else could be changed?
The first thing I would consider is centralizing Ingress and Egress Internet. While there is no charge for Internet Gateways (IGWs), there is a per-hour charge for NAT Gateways (NAT GWs). If we take our setup with 10 VPCs and likely at least 2 NAT GWs per VPC, that is going to get expensive very quickly. Moving Ingress and Egress Internet centrally will cut down on this cost and also give you more scope for having security solutions to protect your environment, such as proxy servers, etc. Because let's be honest, if you have gotten to this point in your AWS journey, it is more than likely that someone from your security team has clocked your rapid success and now wants to slam on the anchors to make sure things are secure :-)
At this point in the journey, it would probably be time to start having conversations about adopting and deploying a Multi-Account Landing Zone setup through the use of Control Tower or maybe even a customized Enterprise Scale Landing Zone through a partner. This is where we would have multiple accounts set up, all with different responsibilities. For instance, in a simple standard Landing Zone, you would have a Management Account that is your bill-paying account, the root of your AWS Org, and central IAM control. Then we have Logging and Archive to centralize logs from all accounts, Security and Audit account where we would enable security tools to protect your AWS environment and workloads, and then you would have your workload accounts. In our setup, for each VPC (and its DR equivalent), we would look to migrate that into its own account and then link it up with the relevant services. Unfortunately, this subject is too much to cover in a single paragraph, so if you are interested in Landing Zones, have a read-up separately. This setup really brings better governance to your environment and ensures that we have clear separation of responsibilities in the environment.
I hope this has been an enjoyable read, and I hope to do some further blogs on other AWS Networking subjects in the near future.
Top comments (0)