Ben Dalton

Posted on Jun 19 • Originally published at Medium

Optimal VPC Subnetting: High Availability, Security, & Seamless On-Prem Connectivity

#aws #security #vpc #subnet

Most cloud networks in enterprise organisations require a design that accommodates workload types often with on-premise data centre connectivity while maintaining security boundaries and high availability. Efficient and effective IP allocation, management and usage is crucial to ensuring sufficient capacity for scalability whilst minimising wastage as there is often a limited number of on-premise routable IP address ranges available to the enterprise (e.g. from IPAM). In this blog, I will share some considerations, advantages and disadvantages for VPC subnet segmentation and IP address allocation to ensure secure, robust, highly available and adaptive networks.

Example VPC Network

The following diagram illustrates a slice of typical team’s VPC network architecture in an enterprise that we will reference throughout this post.

Subnet Segmentation

In large enterprise organisations, you’ll undoubtedly encounter multiple diverse workloads that may or may not be related, yet some level of network separation is always desired. This can be achieved through various methods: utilising separate AWS accounts, establishing distinct VPCs, or segmenting at the subnet level. When considering subnet segmentation within a single VPC, it’s essential to account for several key factors to achieve an optimal design that balances security, scalability, and operational efficiency. Here are some critical considerations for VPC subnet segmentation:

Security Requirements & Workload Isolation: This is paramount. Define clear security zones (e.g., public, private, database) and isolate workloads based on their sensitivity and access needs. This directly impacts how you segment subnets and apply security controls.
High Availability & Multi-AZ Design: To ensure resilience, your subnet design must leverage multiple Availability Zones within your chosen region. Distribute critical components across subnets in different AZs to protect against AZ-wide failures.
IP Address Management (IPAM) & On-Premises Connectivity: Given the often-limited on-premises IP space, efficient IP allocation for both routable (hybrid) and non-routable (VPC-internal) subnets is crucial. In a large enterprise organisation, routable (IPAM) addresses will likely be managed centrally to prevent conflicts, so knowing how they are managed and allocated is key.
Scalability & Future Growth: Design subnets with enough capacity to accommodate future expansion of instances and services without requiring a network re-architecture. Balance this with minimising IP address wastage.
Operational Overhead & Management Complexity: While fine-grained control is valuable, avoid excessive segmentation that leads to overly complex routing tables, security group management, Network Access Control Lists (NACLs), and troubleshooting (e.g., with VPC Flow Logs). Aim for a balance between isolation and manageability.
AWS Service Specific Needs: Many AWS services (e.g., Amazon RDS, Amazon EKS, AWS PrivateLink endpoints) have specific subnet requirements. Factor these into your design early to avoid rework later.

Advantages of Subnet Segmentation

Subnet segmentation offers several compelling benefits:

Granular and Tighter Security Controls:

Splitting VPC subnets by function (such as routable to on-prem, VPC Endpoints, Lambdas, etc.) enables precise security boundaries through security groups and Network Access Control Lists (NACLs).

Resource Isolation: Separate subnets for different resource types create natural security boundaries.
More Robust Defense: Combining subnet-level NACLs with resource-level security groups creates multiple layers of defense.
Least Privilege Access: Traffic patterns can be restricted based on subnet function, limiting potential attack surfaces.
Simplified Auditing: Clearly defined subnet purposes make security audits and compliance more straightforward.

More Efficient Routable CIDR Usage:

Segmentation reduces the number of limited IPAM routable addresses used. For example, allocating non-routable addresses to VPC Endpoints, instead of routable ones, frees up valuable on-premises IP space that might be managed by a central IPAM system (e.g., for data centre or shared egress connectivity).

Primary and Secondary CIDR Ranges

Effective management of both primary and secondary CIDR blocks is key to an optimal VPC design.

Primary CIDR Usage

The primary CIDR block should be sufficiently large to accommodate all planned workloads while allowing for future expansion. Once the VPC has been created, this cannot be changed or expanded without re-creating the VPC, which could mean downtime for your workloads/applications. In a large enterprise organisation, the primary or routable CIDR may be allocated from IPAM to avoid conflicts with other teams when configuring external VPC access, such as egress or connectivity to an on-premise data centre, as this may be managed by a central team.

Example Primary CIDR Allocation: As per the Example VPC Network diagram, the Primary VPC (IPAM) CIDR is 10.50.30.0/24 (256 IP addresses). This primary CIDR is likely the routable CIDR that has connectivity back to on-premise data centres. In our example, these are allocated to the applications running on EC2 instances:

Subnets are split as follows: 10.50.30.0/26, 10.50.30.64/26, and 10.50.30.128/26, which translates to 64 IPs each across three Availability Zones in London.

Secondary CIDR Usage

Secondary CIDR blocks provide additional IP address space without disrupting existing infrastructure.

Non-routable Workloads: Secondary CIDRs can be used for resources that don’t need external access, such as egress, connectivity to on-premise, or the internet. They are ideal for workloads and resources that are not public-facing and are hidden behind other resources such as Transit Gateways, Load Balancers, or CloudFront Distributions.
Specialised Services: Services like VPC Endpoints can use a secondary CIDR to isolate AWS service endpoints and reduce the number of routable IPAM IP addresses that are allocated, as these are often limited.
IP Address Expansion: Useful when the primary CIDR is becoming constrained, secondary CIDRs allow expansion without a complete redesign.

Example Secondary CIDR Allocation: In the Example VPC Network architecture diagram, the VPC Secondary (non-routable) CIDR is 100.0.0.0/22 (1024 IP addresses). These are allocated to VPC Endpoints to allow instances in the routable CIDR to communicate with other AWS Services within the AWS account. Since they do not need to be routable to on-prem, they are allocated secondary non-routable CIDRs:

Endpoint subnets are split across three AZs in London as follows: 100.0.0.0/24, 100.0.1.0/24, 100.0.2.0/24, which works out to be 256 IP addresses each.

From the above example, it is evident that there is sufficient scope in the secondary VPC CIDR for further subnet creation, allocation, and segmentation should the primary/routable CIDR become constrained. It is also worth noting that it is possible to assign more CIDRs to a VPC. At the moment, the AWS quota for this is 5 IPv4 CIDR blocks per VPC.

Disadvantages and Challenges

VPC subnet segmentation does come with its challenges, especially as you may need to manage different types, such as routable and non-routable CIDRs, depending on the type of workloads being run in your account. Some of these are:

Requires Careful Calculation and Planning: Subnet CIDR blocks must be planned meticulously to avoid overlaps and ensure accurate sizing.
Difficult to Reconfigure: Once CIDR blocks are assigned and associated with subnets, changing them would require rebuilding resources, which could potentially cause service downtime and disruption or complex workarounds. This is especially true, as mentioned earlier, for the primary CIDR, which once set, would require a complete rebuild of the VPC in order for it to be changed.
Limitations with IP Address Space: Routable IP ranges, such as those allocated from IPAM, are limited and require careful allocation to avoid wastage or inefficient usage across an organisation.
Route Table Management Overhead: Multiple subnet types may require separate Route Tables, which could increase management complexity in order to ensure connectivity between resources.

Mitigation Strategies

There are a few ways to help combat these challenges and ensure minimal overhead where possible:

Plan for Growth: Carefully plan and design your architectures and workloads to allow for sufficient allocation for your applications to scale as new business requirements need to be met. Tools such as MXToolBox Subnet Calculator can assist in planning and deciding how CIDRs are allocated.
Use an IP Address Management Solution: Tools/services such as AWS IPAM or Infoblox IPAM & DHCP can assist in efficiently and effectively managing and allocating IP addresses. As we’ve seen, this is especially important in large enterprises with multiple different teams, AWS accounts, and networks that may require connectivity back to an on-premise data centre, egress to the internet, or other AWS accounts and networks whilst avoiding IP overlap/conflict.
Document CIDR Allocation: Having detailed documentation of all CIDR allocations and purposes is particularly important in reducing complications when managing security through NACLs, security groups, firewalls, as well as routing between subnets and VPCs.
Evaluate Operational Overhead: Carefully evaluate the complexity introduced by various segmentation strategies. While fine-grained control is good, over-segmentation can lead to increased management overhead for routing, security group rules, network access control lists and network troubleshooting.

Conclusion

Having clearly defined and separated subnets by function and purpose can provide significant security and operational benefits despite some increased planning complexity. An architecture such as the example used here, with routable subnets as well as non-routable subnets across multiple AZs, can deliver a more maintainable, highly available and scalable foundation while establishing clear boundaries between different types of resources and workloads such as those requiring routing to on-premise data centres and connectivity to other AWS Services.

DEV Community