Let's kickoff with the first issue I remember that made me feel very dumb - configuring VPC networking for my cluster.
The assessment required that I setup "a NAT gateway to manage egress traffic from the cluster, VPC networking, subnets, and firewall rules for secure communication.
Like I said in my little introduction to this article series, I didn't have to think about or do any of those for my final project at AltSchool. I let Azure Kubernetess Service (AKS) handle it by default. My knowledge on setting up address spaces was weak and I had no experience with NAT gateways, so I got to googling.
My Rookie Mistake
I came across this article on provisioning AKS and a NAT gateway with Terraform. Luckily, they also detailed their Vnet and Subnet configurations, so I tried to replicate it - which is a fancy way of saying I copied and pasted. But I am not that much of a degenerate; I tried to make it my own. Ironically, that where the problems started. I must have thought "why is bro not picking an address space for service_cidr from the VNet/Subnet he provisioned?". As you may have already guessed, I f***ed around and found out. I kept getting errors at terraform apply that I didn't have the patience to decipher.
Since I was basically vibe-coding with knowledge of VPC and cluster networking that was next to zero, it took me quite a while to get out of this hole.
What a service_cidr is
service_cidr isnβt part of your physical network. It's a virtual IP space that Kubernetes manages on its own. Itβs meant for internal cluster service discovery, separate from your VNet and subnets.
The subnet (attached to your node pool) on the other hand is meant for the nodes (virtual machines) and pods (containers or groups of containers).
To help drive the point home, let's say you have a VNet with an address space of 10.240.0.0/16, a subnet carved out for your cluster node pool at 10.240.0.0/22 and a service_cidr of 172.16.0.0/16.
[VNet: 10.240.0.0/16]
βββ [Cluster Subnet: 10.240.1.0/22]
βββ Node 1 (10.240.1.10)
β βββ Deployment (3 Replicas) βββββββββββββββββββ
β βββ Pod (Replica 1) (10.240.1.11) ββ Kubernetes Service (svc) ββ Other deployment/pods or ingress
β βββ Pod (Replica 2) (10.240.1.12) ββ€ββββββ
β βββ Pod (Replica 3) (10.240.1.13) ββ
βββ Node 2 (10.240.1.20)
βββ Pod (10.240.1.21) # Unrelated to the service/deployment above
[service_cidr: 172.16.0.0/16] (Virtual)
βββ Service (svc) IP: 172.16.0.50 # Virtual IP for load balancing. Distributes traffic to pods with selector: app=my-app
βββ Load Balancer (kube-proxy) # Traffic to the Virtual IP is dynamically routed to one available pod endpoint based on kube-proxy rules
β # (iptables/nftables/IPVS) and pod readiness (e.g., round-robin, random, or least connections)
ββββ Pod IPs: 10.240.1.11, 10.240.1.12, 10.240.1.13
Notes:
- Kubernetes Service (svc): A Kubernetes abstraction that allows pods to communicate with each other using a stable virtual IP address.
- service_cidr: A range of IP addresses used by Kubernetes to assign IPs to services.
- Kube-proxy: A component that manages network rules on nodes to route traffic to the correct pod IPs.
- Pod IP: The IP address assigned to a pod within the provided Cluster Subnet.
- Service (svc) IPs (e.g. 172.16.0.50) are virtual IPs that do not correspond to a physical device.
- The svc uses a selector (e.g., app=my-app) to dynamically target all pods matching the label.
- kube-proxy (running on each node) handles load balancing (e.g., round-robin, random) to distribute traffic to available pods.
- Pods can be on any node (Node 1 or Node 2) and are dynamically added/removed based on deployment scaling or failures.
When a pod calls a service, kube-proxy steps in, maps the service IP (say, 172.16.0.50) to a pod IP (like 10.240.1.13), and routes the traffic over the real network. The service_cidr is just an abstraction β it doesnβt βliveβ on the Azure infrastructure.
If you're relatively new to Kubernetes, I know what you are thinking. Why not connect directly to a podβs IP? Well, here are are few more things to note:
- Pod IPs are ephemeral (can change at any time); Services provide a stable VIP/DNS.
- Services load balance across pod replicas and only route to the ones that are available.
- Zero-downtime rollouts are made possible by the abstraction; Upgrades and scaling can change backends without affecting access.
The Code That Saved the Day
After almost pulling my hair out, I finally got it right. Hereβs the Terraform config that worked:
resource "azurerm_virtual_network" "time_api_vnet" {
name = "vnet-${azurerm_resource_group.time_api_rg.name}"
address_space = ["10.240.0.0/16"]
location = azurerm_resource_group.time_api_rg.location
resource_group_name = azurerm_resource_group.time_api_rg.name
}
resource "azurerm_subnet" "time_api_subnet" {
name = "subnet-${azurerm_resource_group.time_api_rg.name}"
resource_group_name = azurerm_resource_group.time_api_rg.name
virtual_network_name = azurerm_virtual_network.time_api_vnet.name
address_prefixes = ["10.240.0.0/22"]
}
# .....NSG rules and association configuration would go here, but is omitted for brevity.
resource "azurerm_nat_gateway" "time_api_nat_gateway" {
name = "natgw-${azurerm_resource_group.time_api_rg.name}"
location = azurerm_resource_group.time_api_rg.location
resource_group_name = azurerm_resource_group.time_api_rg.name
sku_name = "Standard"
idle_timeout_in_minutes = 10
tags = {
Environment = "test"
}
}
resource "azurerm_public_ip" "time_api_public_ip" {
name = "public-ip-${azurerm_resource_group.time_api_rg.name}"
location = azurerm_resource_group.time_api_rg.location
resource_group_name = azurerm_resource_group.time_api_rg.name
allocation_method = "Static"
sku = "Standard"
}
resource "azurerm_subnet_nat_gateway_association" "time_api_natgw_subnet_association" {
nat_gateway_id = azurerm_nat_gateway.time_api_nat_gateway.id
subnet_id = azurerm_subnet.time_api_subnet.id
}
resource "azurerm_nat_gateway_public_ip_association" "time_api_natgw_public_ip_association" {
nat_gateway_id = azurerm_nat_gateway.time_api_nat_gateway.id
public_ip_address_id = azurerm_public_ip.time_api_public_ip.id
}
resource "azurerm_kubernetes_cluster" "time_api_cluster" {
name = "aks-${azurerm_resource_group.time_api_rg.name}-cluster"
resource_group_name = azurerm_resource_group.time_api_rg.name
location = azurerm_resource_group.time_api_rg.location
dns_prefix = "dns-${azurerm_resource_group.time_api_rg.name}-cluster"
kubernetes_version = data.azurerm_kubernetes_service_versions.current.default_version
node_resource_group = "nrg-aks-${azurerm_resource_group.time_api_rg.name}-cluster"
default_node_pool {
name = "default"
vm_size = "Standard_D2_v2"
auto_scaling_enabled = true
max_count = 2
min_count = 1
os_disk_size_gb = 30
type = "VirtualMachineScaleSets"
vnet_subnet_id = azurerm_subnet.time_api_subnet.id
# ...other node pool configurations
}
identity {
type = "SystemAssigned"
}
azure_active_directory_role_based_access_control {
azure_rbac_enabled = true
admin_group_object_ids = [azuread_group.time_api_admins.object_id]
}
network_profile {
network_plugin = "azure"
network_policy = "azure"
load_balancer_sku = "standard"
dns_service_ip = "172.16.0.10"
service_cidr = "172.16.0.0/16"
outbound_type = "userAssignedNATGateway"
nat_gateway_profile {
idle_timeout_in_minutes = 4
}
}
# ....rest of the AKS cluster configuration
}
See that service_cidr set to 172.16.0.0/16? Itβs safely outside the VNetβs 10.240.0.0/16 range β no overlap, no drama.
Other Relevant VPC Networking Points
Alright, so I survived the service_cidr fiasco, but there were a few more networking nuggets that made me go, βOh, thatβs how it works!β Hereβs the stuff I wish I knew before crashing and burning my way through AKS networking. These are the bits that tie the VNet, subnets, and that fancy NAT Gateway together, so you donβt end up in a similar hole like mine.
Your Subnet Is the Real Deal for Nodes and Pods
The subnet you define (like 10.240.0.0/22 in my setup) is where the real network lives. Itβs carved out of your VNet (e.g. 10.240.0.0/16) and assigned to your AKS node pool via vnet_subnet_id in the Terraform config. Nodes (VMs) and pods (containers or individual groups of containers) grab IPs from this rangeβlike 10.240.1.10 for a node or 10.240.1.11 for a pod.
Not so Pro Tip: make sure your subnet is big enough (e.g., /22 gives ~1,000 IPs) to handle all your nodes and pods. So that AKS doesn't start throwing βno IPs leftβ errors when scaling.
Also, some Network Security Group (NSG) rules on that subnet can help to control traffic on the cloud level. Isolation and least privilege access is the name of the game.
service_cidr Is Just Kubernetes Playing Pretend
That 172.16.0.0/16 service_cidr? Itβs not touching your Azure network β itβs like a figment of Kubernetes' imagination (you know, make-believe). Itβs where service IPs (like 172.16.0.50 in the diagram) and the DNS IP (172.16.0.10 for CoreDNS) live. When a pod or ingress pings a service IP, kube-proxy (the traffic cop) translates it to a real pod IP (like 10.240.1.11, 10.240.1.12, or 10.240.1.13) and sends it over the subnet. The diagram shows this load balancing in action β traffic hits the service IP, and kube-proxy picks a healthy pod to route to, using tricks like round-robin. No physical device has that 172.16.0.50 IP; itβs all software magic.
Pick Your service_cidr Carefully (or Pay the Price)
You can let AKS pick a service_cidr (it defaults to 10.0.0.0/16), but if it overlaps with your VNet or peered networks, youβre in for a bad time. Trust me, I got my fair share of Terraform errors before I learned this. Stick to RFC 1918 private ranges (10.0.0.0/8, 172.16.0.0/12, or 192.168.0.0/16) β Public IP ranges? Forget it. And whatever you choose, make sure it doesn't clash with your VNet, peered networks, or anything else in your setup. Here's a quick cheat sheet I wish I'd had:
| Your VNet Range | Good `service_cidr` Options |
|-------------------|-------------------------------------|
| `10.x.x.x/16` | `172.16.0.0/16` or `192.168.0.0/16` |
| `172.16.x.x/12` | `10.96.0.0/12` or `192.168.0.0/16` |
| `192.168.x.x/16` | `10.96.0.0/12` or `172.16.0.0/16` |
Oh, and that dns_service_ip? Avoid the first IP in your range (like 172.16.0.1) β Kubernetes reserves it for internals. As you must have noticed, I went with .10 myself.
Load Balancing Ties It All Together
Back to that diagram: the Kubernetes Service (172.16.0.50) is your VIP (Very Important IP) that load-balances across pods. Kube-proxy decides which pod gets the traffic based on readiness and rules (e.g., round-robin). This is why you donβt hardcode pod IPs β theyβre temporary! The service_cidr and kube-proxy make sure your app stays reachable, even if pods move or die. Itβs like a traffic director who doesnβt care which car (pod) gets you there, as long as you arrive.
NAT Gateway: The Unsung Hero for Outbound Traffic
Since I used outbound_type = "userAssignedNATGateway", my clusterβs outbound traffic (like pods hitting external APIs) goes through the NAT Gateway. The idle_timeout_in_minutes = 4 in my AKS config (and 10 in the azurerm_nat_gateway resource β oops, my bad for the mismatch!) controls how long idle connections stay open. Four minutes is fine for most apps, but if youβre running something long-lived like streaming, you might want to bump it up to avoid dropped connections. Also, ensure your NAT Gateway is tied to your subnet (via azurerm_subnet_nat_gateway_association) and has a public IP for internet access.
Key Takeaway
In a nutshell, plan your VNet and subnet sizes, keep your service_cidr separate, and double-check for overlaps. AKS can auto-pick some settings, but explicit configs (like in my Terraform) save you from surprises. Oh, and donβt wing it like I did β read or, at least, use any appropriate AI tool of your choice to try to understand the relevant docs first!
Here is the link to the project repo again if you want to try it out for yourself.
See you on the next one.
Top comments (0)