Great-Victor Anjorin

Posted on Aug 10

Confessions of a DevOps Noob Who Tried to pick a K8S Service CIDR from an Azure Subnet 🤦‍♂️

#kubernetes #azure #networking

Let's kickoff with the first issue I remember that made me feel very dumb - configuring VPC networking for my cluster.

The assessment required that I setup "a NAT gateway to manage egress traffic from the cluster, VPC networking, subnets, and firewall rules for secure communication.

Like I said in my little introduction to this article series, I didn't have to think about or do any of those for my final project at AltSchool. I let Azure Kubernetess Service (AKS) handle it by default. My knowledge on setting up address spaces was weak and I had no experience with NAT gateways, so I got to googling.

My Rookie Mistake

I came across this article on provisioning AKS and a NAT gateway with Terraform. Luckily, they also detailed their Vnet and Subnet configurations, so I tried to replicate it - which is a fancy way of saying I copied and pasted. But I am not that much of a degenerate; I tried to make it my own. Ironically, that where the problems started. I must have thought "why is bro not picking an address space for service_cidr from the VNet/Subnet he provisioned?". As you may have already guessed, I f***ed around and found out. I kept getting errors at terraform apply that I didn't have the patience to decipher.

Since I was basically vibe-coding with knowledge of VPC and cluster networking that was next to zero, it took me quite a while to get out of this hole.

What a `service_cidr` is

service_cidr isn’t part of your physical network. It's a virtual IP space that Kubernetes manages on its own. It’s meant for internal cluster service discovery, separate from your VNet and subnets.

The subnet (attached to your node pool) on the other hand is meant for the nodes (virtual machines) and pods (containers or groups of containers).

To help drive the point home, let's say you have a VNet with an address space of 10.240.0.0/16, a subnet carved out for your cluster node pool at 10.240.0.0/22 and a service_cidr of 172.16.0.0/16.


[VNet: 10.240.0.0/16]
└── [Cluster Subnet: 10.240.1.0/22]
├── Node 1 (10.240.1.10)
│   └── Deployment (3 Replicas) ──────────────────┐
│       ├── Pod (Replica 1) (10.240.1.11) ←┐    Kubernetes Service (svc) ←─ Other deployment/pods or ingress
│       ├── Pod (Replica 2) (10.240.1.12) ←┤←────┘
│       └── Pod (Replica 3) (10.240.1.13) ←┘
└── Node 2 (10.240.1.20)
└── Pod (10.240.1.21)  # Unrelated to the service/deployment above

[service_cidr: 172.16.0.0/16] (Virtual)
└── Service (svc) IP: 172.16.0.50   # Virtual IP for load balancing. Distributes traffic to pods with selector: app=my-app
    └── Load Balancer (kube-proxy)  # Traffic to the Virtual IP is dynamically routed to one available pod endpoint based on kube-proxy rules
        │                           # (iptables/nftables/IPVS) and pod readiness (e.g., round-robin, random, or least connections)
        └──→ Pod IPs: 10.240.1.11, 10.240.1.12, 10.240.1.13

Notes:

- Kubernetes Service (svc): A Kubernetes abstraction that allows pods to communicate with each other using a stable virtual IP address.
- service_cidr: A range of IP addresses used by Kubernetes to assign IPs to services.
- Kube-proxy: A component that manages network rules on nodes to route traffic to the correct pod IPs.
- Pod IP: The IP address assigned to a pod within the provided Cluster Subnet.

- Service (svc) IPs (e.g. 172.16.0.50) are virtual IPs that do not correspond to a physical device.
- The svc uses a selector (e.g., app=my-app) to dynamically target all pods matching the label.
- kube-proxy (running on each node) handles load balancing (e.g., round-robin, random) to distribute traffic to available pods.
- Pods can be on any node (Node 1 or Node 2) and are dynamically added/removed based on deployment scaling or failures.

When a pod calls a service, kube-proxy steps in, maps the service IP (say, 172.16.0.50) to a pod IP (like 10.240.1.13), and routes the traffic over the real network. The service_cidr is just an abstraction — it doesn’t “live” on the Azure infrastructure.

If you're relatively new to Kubernetes, I know what you are thinking. Why not connect directly to a pod’s IP? Well, here are are few more things to note:

Pod IPs are ephemeral (can change at any time); Services provide a stable VIP/DNS.
Services load balance across pod replicas and only route to the ones that are available.
Zero-downtime rollouts are made possible by the abstraction; Upgrades and scaling can change backends without affecting access.

The Code That Saved the Day

After almost pulling my hair out, I finally got it right. Here’s the Terraform config that worked:

resource "azurerm_virtual_network" "time_api_vnet" {
  name                = "vnet-${azurerm_resource_group.time_api_rg.name}"
  address_space       = ["10.240.0.0/16"]
  location            = azurerm_resource_group.time_api_rg.location
  resource_group_name = azurerm_resource_group.time_api_rg.name
}

resource "azurerm_subnet" "time_api_subnet" {
  name                 = "subnet-${azurerm_resource_group.time_api_rg.name}"
  resource_group_name  = azurerm_resource_group.time_api_rg.name
  virtual_network_name = azurerm_virtual_network.time_api_vnet.name
  address_prefixes     = ["10.240.0.0/22"]
}

# .....NSG rules and association configuration would go here, but is omitted for brevity.

resource "azurerm_nat_gateway" "time_api_nat_gateway" {
  name                    = "natgw-${azurerm_resource_group.time_api_rg.name}"
  location                = azurerm_resource_group.time_api_rg.location
  resource_group_name     = azurerm_resource_group.time_api_rg.name
  sku_name                = "Standard"
  idle_timeout_in_minutes = 10

  tags = {
    Environment = "test"
  }
}

resource "azurerm_public_ip" "time_api_public_ip" {
  name                = "public-ip-${azurerm_resource_group.time_api_rg.name}"
  location            = azurerm_resource_group.time_api_rg.location
  resource_group_name = azurerm_resource_group.time_api_rg.name
  allocation_method   = "Static"
  sku                 = "Standard"
}

resource "azurerm_subnet_nat_gateway_association" "time_api_natgw_subnet_association" {
  nat_gateway_id = azurerm_nat_gateway.time_api_nat_gateway.id
  subnet_id      = azurerm_subnet.time_api_subnet.id
}

resource "azurerm_nat_gateway_public_ip_association" "time_api_natgw_public_ip_association" {
  nat_gateway_id       = azurerm_nat_gateway.time_api_nat_gateway.id
  public_ip_address_id = azurerm_public_ip.time_api_public_ip.id
}

resource "azurerm_kubernetes_cluster" "time_api_cluster" {
  name                = "aks-${azurerm_resource_group.time_api_rg.name}-cluster"
  resource_group_name = azurerm_resource_group.time_api_rg.name
  location            = azurerm_resource_group.time_api_rg.location
  dns_prefix          = "dns-${azurerm_resource_group.time_api_rg.name}-cluster"
  kubernetes_version  = data.azurerm_kubernetes_service_versions.current.default_version
  node_resource_group = "nrg-aks-${azurerm_resource_group.time_api_rg.name}-cluster"

  default_node_pool {
    name                 = "default"
    vm_size              = "Standard_D2_v2"
    auto_scaling_enabled = true
    max_count            = 2
    min_count            = 1
    os_disk_size_gb      = 30
    type                 = "VirtualMachineScaleSets"
    vnet_subnet_id       = azurerm_subnet.time_api_subnet.id

    # ...other node pool configurations
  }

  identity {
    type = "SystemAssigned"
  }

  azure_active_directory_role_based_access_control {
    azure_rbac_enabled     = true
    admin_group_object_ids = [azuread_group.time_api_admins.object_id]
  }

  network_profile {
    network_plugin    = "azure"
    network_policy    = "azure"
    load_balancer_sku = "standard"
    dns_service_ip    = "172.16.0.10"
    service_cidr      = "172.16.0.0/16"
    outbound_type     = "userAssignedNATGateway"
    nat_gateway_profile {
      idle_timeout_in_minutes = 4
    }
  }

  # ....rest of the AKS cluster configuration
}

See that service_cidr set to 172.16.0.0/16? It’s safely outside the VNet’s 10.240.0.0/16 range — no overlap, no drama.

Other Relevant VPC Networking Points

Alright, so I survived the service_cidr fiasco, but there were a few more networking nuggets that made me go, “Oh, that’s how it works!” Here’s the stuff I wish I knew before crashing and burning my way through AKS networking. These are the bits that tie the VNet, subnets, and that fancy NAT Gateway together, so you don’t end up in a similar hole like mine.

Your Subnet Is the Real Deal for Nodes and Pods

The subnet you define (like 10.240.0.0/22 in my setup) is where the real network lives. It’s carved out of your VNet (e.g. 10.240.0.0/16) and assigned to your AKS node pool via vnet_subnet_id in the Terraform config. Nodes (VMs) and pods (containers or individual groups of containers) grab IPs from this range—like 10.240.1.10 for a node or 10.240.1.11 for a pod.

Not so Pro Tip: make sure your subnet is big enough (e.g., /22 gives ~1,000 IPs) to handle all your nodes and pods. So that AKS doesn't start throwing “no IPs left” errors when scaling.
Also, some Network Security Group (NSG) rules on that subnet can help to control traffic on the cloud level. Isolation and least privilege access is the name of the game.

`service_cidr` Is Just Kubernetes Playing Pretend

That 172.16.0.0/16 service_cidr? It’s not touching your Azure network — it’s like a figment of Kubernetes' imagination (you know, make-believe). It’s where service IPs (like 172.16.0.50 in the diagram) and the DNS IP (172.16.0.10 for CoreDNS) live. When a pod or ingress pings a service IP, kube-proxy (the traffic cop) translates it to a real pod IP (like 10.240.1.11, 10.240.1.12, or 10.240.1.13) and sends it over the subnet. The diagram shows this load balancing in action — traffic hits the service IP, and kube-proxy picks a healthy pod to route to, using tricks like round-robin. No physical device has that 172.16.0.50 IP; it’s all software magic.

Pick Your `service_cidr` Carefully (or Pay the Price)

You can let AKS pick a service_cidr (it defaults to 10.0.0.0/16), but if it overlaps with your VNet or peered networks, you’re in for a bad time. Trust me, I got my fair share of Terraform errors before I learned this. Stick to RFC 1918 private ranges (10.0.0.0/8, 172.16.0.0/12, or 192.168.0.0/16) — Public IP ranges? Forget it. And whatever you choose, make sure it doesn't clash with your VNet, peered networks, or anything else in your setup. Here's a quick cheat sheet I wish I'd had:

| Your VNet Range   | Good `service_cidr` Options         |
|-------------------|-------------------------------------|
| `10.x.x.x/16`     | `172.16.0.0/16` or `192.168.0.0/16` |
| `172.16.x.x/12`   | `10.96.0.0/12` or `192.168.0.0/16`  |
| `192.168.x.x/16`  | `10.96.0.0/12` or `172.16.0.0/16`   |

Oh, and that dns_service_ip? Avoid the first IP in your range (like 172.16.0.1) — Kubernetes reserves it for internals. As you must have noticed, I went with .10 myself.

Load Balancing Ties It All Together

Back to that diagram: the Kubernetes Service (172.16.0.50) is your VIP (Very Important IP) that load-balances across pods. Kube-proxy decides which pod gets the traffic based on readiness and rules (e.g., round-robin). This is why you don’t hardcode pod IPs — they’re temporary! The service_cidr and kube-proxy make sure your app stays reachable, even if pods move or die. It’s like a traffic director who doesn’t care which car (pod) gets you there, as long as you arrive.

NAT Gateway: The Unsung Hero for Outbound Traffic

Since I used outbound_type = "userAssignedNATGateway", my cluster’s outbound traffic (like pods hitting external APIs) goes through the NAT Gateway. The idle_timeout_in_minutes = 4 in my AKS config (and 10 in the azurerm_nat_gateway resource — oops, my bad for the mismatch!) controls how long idle connections stay open. Four minutes is fine for most apps, but if you’re running something long-lived like streaming, you might want to bump it up to avoid dropped connections. Also, ensure your NAT Gateway is tied to your subnet (via azurerm_subnet_nat_gateway_association) and has a public IP for internet access.

Key Takeaway

In a nutshell, plan your VNet and subnet sizes, keep your service_cidr separate, and double-check for overlaps. AKS can auto-pick some settings, but explicit configs (like in my Terraform) save you from surprises. Oh, and don’t wing it like I did — read or, at least, use any appropriate AI tool of your choice to try to understand the relevant docs first!

Here is the link to the project repo again if you want to try it out for yourself.

See you on the next one.

DEV Community

Confessions of a DevOps Noob Who Tried to pick a K8S Service CIDR from an Azure Subnet 🤦‍♂️

My Rookie Mistake

What a `service_cidr` is

The Code That Saved the Day

Other Relevant VPC Networking Points

Your Subnet Is the Real Deal for Nodes and Pods

`service_cidr` Is Just Kubernetes Playing Pretend

Pick Your `service_cidr` Carefully (or Pay the Price)

Load Balancing Ties It All Together

NAT Gateway: The Unsung Hero for Outbound Traffic

Key Takeaway

Top comments (0)

My Rookie Mistake

What a service_cidr is

The Code That Saved the Day

Other Relevant VPC Networking Points

Your Subnet Is the Real Deal for Nodes and Pods

service_cidr Is Just Kubernetes Playing Pretend

Pick Your service_cidr Carefully (or Pay the Price)

Load Balancing Ties It All Together

NAT Gateway: The Unsung Hero for Outbound Traffic

Key Takeaway

What a `service_cidr` is

`service_cidr` Is Just Kubernetes Playing Pretend

Pick Your `service_cidr` Carefully (or Pay the Price)