Jonathan Rau

Posted on Jul 3, 2023

AWS Security Beginner Series #1: Global Infrastructure

Forward

(This forward will always appear). To get back into writing broadly helpful blogs, I want to resurrect a series I once wrote on a personal blog a long time ago, "AWS Security for Beginners" which looks to repackage fundamental building blocks and hopefully help newcomers and lateral movers to the cloud security industry understand core competencies.

I have done everything from being a Project Manager to Product Manager to Engineer to Architect to CISO and roles in between. I have been fortunate to stay very hands-on and involved with building and architecting on AWS (and other clouds & platforms) and want to put that body of knowledge to good use on your behalf.

The series will be loosely chronological, meaning that thematically I will go from simple & broad concepts to more specialized concepts as the series progresses and I will try to not create dependencies in the form of having to go back to X number of blogs to learn something.

Hope you learn something, but if not, that you at least Stay Dangerous.

So, what is AWS again?

The most formative question when you begin: what exactly is AWS? Amazon Web Services (abbreviated as AWS or AWS Cloud) is Amazon's public cloud business which has been around nearly two decades. AWS allows you to access services such as storage, computing power, database, and other specialized use-case specific tools over the internet (hence public cloud) using a console (or other interfaces, we'll get to that later in the series) in which pay as you go meaning you get a monthly bill depending on your usage of a service. This can be how many gigabytes of data you store, how much memory is in a compute instance, how many times you run or invoke something, and so on.

The industry has largely moved away from distinctions of public cloud versus private cloud, but it is an important distinction from a security perspective. When I was first learning about the cloud I was always confused about the public piece, again, that is to denote you can access and consume services from anywhere on the internet. While your services that you build on using cloud resources can be public (e.g., they have an internet-facing IP address and/or DNS name) that is not what public cloud means.

The next question you may have (I had it) was how the hell is this even possible? The easiest explanation, without getting too much into the inner workings and macroeconomics of virtualization, data centers, and hardware financial lifecycles is that Amazon uses their money to purchase an eye-watering amount of traditional hardware and essentially leases parts of it, on-demand, to you.

Traditionally, a company would need to purchase several servers (with their own motherboards, cooling, power supply, CPUs, RAM, GPUs, etc.), networking equipment (routers, switches), storage, and license virtualization, operating systems, database engines, security tools, and more. All this hardware would be "racked and stacked" meaning assembled into their final configuring, "hooked up", and have the OS and applications installed onto it. The company would use virtualization to essentially get "the most bang for their buck" by creating virtual servers on the physical servers.

So if your server has 64GB of RAM (memory) available, and the application you are running only requires 8-12GB, you can slice and dice that 64GB into virtualized servers which has its own dedicated CPU, memory, storage, and networking carved out from the physical gear on the server. This could be done to serve more customers with their own copies of an application (sometimes called a tenant or instance) or to stretch budgetary limits or to even segregated workloads from each other.

This is of course, expensive, as you bare an upfront cost burden (though there are other leasing and rent-to-own schemes) plus the human capital expense of having systems administrators, network engineers, and more with expertise in the entire lifecycle from the day the server is unboxed to the day you send/sell it back.

This has not changed.

The value proposition of using the cloud is that the giant multi-national multi-billion dollar company does that all on your behalf and uses the same virtualization technologies and economies of scale -- they have greater buying power to command greater discounts to purchase more stuff -- to let you consume the services and only pay for what exactly you consume, as noted earlier.

This greatly lowered the barrier to entry for startups and established companies that were looking to modernize across all industries. Now, nearly everything you consume from your on-demand videos to video games to food ordering to banking to taxes to taxis and more all are hosted on the cloud. As the usage of cloud services evolved, even the most cynical and secretive organizations in the world have adopted the cloud for doing everything from raw storage to advanced targeting using machine learning algorithms and big data analysis.

Now how does AWS organization what amounts to 10s of 1000s of servers, network gear, and storage drives?

AWS Global Infrastructure Basics

“Global Infrastructure" is AWS' own terminology which broadly refers to the entire geographic distribution and hierarchy of how you consume and interact with the AWS Cloud. The "internet" is a huge place, and for the best performance, you should consume these services as geographically close to you as you can. It takes energy (and thus costs money) to move even the smallest byte across the world. The closer you can be to that point of origin, the better performance and cost you get, for the most part.

To ensure that AWS services can be consumed by customers across the world and all industries, AWS first picks a Region that they define as "...a physical location around the world where we cluster data centers." Remember, the data center is just a location that contains all of the hardware needed as well as support staff and security, and each Region is made up of a cluster of more than one Availability Zone (AZ) which AWS denotes as a "...group of logical data centers..." which is at least one data center.

The number of DCs that comprise an AZ largely depends on the newness of the Region and the actual geographic area. There is only so much space and access to the required power and labor force in any given area in the globe - and that largely dictates where AWS can build a Region as well as how many of these AZs. As demand, geopolitics, and economic environments change, AWS may choose to expand a Region or build new ones. Another barrier to creating more AZs is that AWS commits to at least 100KM (60 miles) between the AZs, but does not disclose how far the specific DCs are from one another.

Each Region is made up of at least three AZs and each AZ "...has independent power, cooling, and physical security and is connected via redundant, ultra-low-latency networks. AWS customers focused on high availability can design their applications to run in multiple AZs to achieve even greater fault-tolerance." When you select a Region to use an AWS service in, at a minimum, you're using at least one AZ. Sometimes this AZ usage is more pronounced, which we will get to, but all you need to know is an AZ denotes a data center somewhere within the area of the Region.

Regions are usually centered in a metropolitan area but may not be referenced as such. For example, AWS' North Virginia region, known as us-east-1 covers cities as far north as Ashburn and Sterling, as far south as Manassas and as far east and west as Vienna and Haymarket, respectively. Other Regions may not be as large - the Ohio Region (us-east-2) is within the Greater Columbus Ohio area that spans from Hilliard, Dublin, Columbus proper, and Worthington.

Another quirk of AWS AZ's is that they can be different for everyone. When you choose "Availability Zone 'A'" in us-east-1 (known as us-east-1a) to build or deploy an AWS service, it is different from customer to customer, that AZ "A" can be in a specific building in Herndon or Vienna or Manassas. There is a way to tell, but that's largely out of your control, and also largely does not matter. More information on this quirk (and more facts about the Global Infrastructure) can be found here.

When you select a Region, it should strike a balance from how close it is to customers (if you are building a service that is meant to be consumed) and how close it is to your team. This latency may not always matter, but the difference of a few milliseconds (or a few seconds) can make the difference between a consumer choosing between you and your competition, it can also have influence on timeouts for your services, and also loosely has some impacts on security. We'll discuss this much later, but there are regulatory and industry compliance requirements as well as laws that may dictate that certain (or all) types of a specific (or all) data for a certain market or demographic must be contained in one area. A common callout here is the European Union's General Data Protection Regulation (GDPR) which codifies legal and technical requirements for how data about EU citizens can be accessed, transferred, and stored.

There are also cost and availability considerations, which is more "advanced", some Regions are more expensive than others to use specific services. Additionally, Regions may be constrained with availability for certain Services. This can happen for very old and very new AWS services as well as AWS services and components that are used a lot - this will be explored much later as to not belabor the point - if you're just learning, pick a Region closest to you, and do not worry about it much.

Ultimately, as touched on, your choice may be dictated to you by AWS in the form of Service availability. AWS has nearly 400 distinct services ranging from object storage to very specialized security services, as AWS builds services on other AWS services, there are only so many areas that can be covered as they are created. Always refer to the information page of a service you wish to consume to be sure.

AWS Global Infrastructure Extended

While Regions and their AZs are considered the formative baseline of the AWS Global Infrastructure and are often very muted depending on the service(s) you use, there are other Global Infrastructure concepts that may be beneficial to keep in the back of your mind.

The first is the Local Zone which you can think of as a mini-AZ that has an even more specific geographic area but with the drawback that they're not numerous, they do not have the ability to host or deploy every AWS service, and can cost several times more than in a regular AZ. This is because the economies of scale are not as great (given the smaller location) and that other redundancies such as dedicated networking is built into the area. You may use one if you had incredibly specific sets of customers in addition to ultra-low latency requirements. As of the time of this writing, AWS boasts 34 Local Zones from Newark, New Jersey, USA to Auckland, New Zealand to Lagos, Nigeria with dozens more planned.

If a Local Zone is a mini-AZ, then a Wavelength Zone (WLZ) is a mini-Local Zone (or a micro-AZ, if you'd like). WLZs follow the same principles of hyper-localization and ultra-low latencies except instead of a metropolitan area edge location for the internet, the WLZ edge is a telecommunications provider's network coverage area, focused around 5G service. This means that the latency is even lower potentially than a Local Zone and have even less available services to consume and greater costs. However, if you have a specific carrier network where your customers are specifically consuming services via a mobile application, it can be worth looking into. In the United States, all the WLZs at the time of this writing are in specific Verizon 5G locations in specific cities, with other carriers accounted for outside of the United States such as Vodafone in Berlin and Munich or Bell within Toronto.

Taking a big step back from geographic specific concepts such as Local Zones and Wavelength Zones, there is a global network within the AWS Global Infrastructure known as the Global Edge Network. The Global Edge Network is what underpins Amazon's Content Delivery Network (CDN) service, known as Amazon CloudFront. We will examine this service in greater depth in the future. The important thing to know about CloudFront and CDNs in general is they are useful for supplying content close to your consumers like a Local Zone or WLZ can be used for. Instead of building an entire offering, you use a CDN to supply resilience, performance, and lower latency by "caching" content such as videos or pictures in small "Edge locations" which reduces strain on your backend and supplies a better consumer experience.

Using CloudFront, AWS handles this distribution for you and boasts over 400 "Points of Presence" which are mini datacenters located all over the world from South Africa to Japan to New York to Oslo which are controlled by 13 Regional Edge Caches, you do not need to do anything to utilize this beyond configuring the CloudFront service. However, you can apply geographic restrictions and tune your caching and security performance more specifically, which we will explore later.

To round out the "extended" topics, the last concept to be familiar with is an AWS Outpost which are "...fully managed solutions delivering AWS infrastructure and services to virtually any on-premises or edge location for a truly consistent hybrid experience. Outposts solutions allow you to extend and run native AWS services on premises, and is available in a variety of form factors, from 1U and 2U Outposts servers to 42U Outposts racks, and multiple rack deployments." What does this mean in practice? Imagine a portable, self-contained, mini data center that specifically mirrors AWS that you can bring with you (provided you can power the thing).

AWS Outposts range from "1U" size which is a specific blade in a server rack up to the "42U" behemoths that can be utilized for companies that either want to directly support a "hybrid" use case, are beginning a migration or exploratory analysis of AWS, have even lower or specialized latency requirements, or part of a government, military, law enforcement or disaster response organization that requires usage close to you an in potential austere, restricted, denied, or remote environments. The prices vary greatly, but can be paid monthly or up-front for 1 or 3 year terms which can range from $170,000 USD up to $775,000 USD depending on the configurations.

Outside of the lab, you may encounter one or more of the concepts, but on your own it is not likely you will have much interaction with any, but they are important to understand. While the overt security requirements may differ, the key to being a good cloud security practitioner no matter your specialization is familiarity.

Next Steps

Take time to explore the hyperlinks and setup an AWS Account if you have not already, again, a lot of these concepts are abstracted from you depending on how you use AWS but are important formative topics to learn.

In the next entry we will cover the AWS Shared Responsibility model which is often talked about but not well understood in certain cases.

Until then...

Stay Dangerous.

DEV Community

AWS Security Beginner Series #1: Global Infrastructure

Forward

So, what is AWS again?

AWS Global Infrastructure Basics

AWS Global Infrastructure Extended

Next Steps

Top comments (0)

Read next

Honest Author Rankings Boost Peer Review Accuracy: New Study Shows Promise in Machine Learning Conferences

New AI System Makes Medical Image Analysis More Reliable Across Different Equipment and Conditions

New Method Reveals Hidden 'Fingerprints' in AI Language Models to Protect Ownership

How Frequency-Based Methods Improve AI's Ability to Detect and Classify Sounds