DEV Community

Cover image for Azure Kubernetes Service (AKS) Network Policies: A Comprehensive Guide
Mikael Krief
Mikael Krief

Posted on

Azure Kubernetes Service (AKS) Network Policies: A Comprehensive Guide

Network security is a critical component of any Kubernetes deployment, especially in enterprise environments where regulatory compliance and data protection are paramount. Network policies in Kubernetes provide a way to control traffic flow between pods and network endpoints at the application layer, similar to how security groups and firewalls work for virtual machines.

Azure Kubernetes Service (AKS) supports multiple network policy implementations, each with its own strengths, features, and use cases. Understanding the differences between these options is crucial for designing secure, performant, and compliant Kubernetes workloads.

In this article, we'll explore the different network policy options available for AKS:

  • Azure Network Policy Manager (Azure NPM): Microsoft's native solution for basic network policies
  • Calico: A popular open-source network policy engine with advanced features
  • Cilium: A modern, eBPF-based networking and security solution

We'll compare these options across various dimensions including:

  • Feature sets: What capabilities each solution provides
  • Performance: How each solution impacts cluster performance
  • Ease of use: Installation, configuration, and management complexity
  • Integration: How well each solution integrates with Azure services
  • Use cases: When to choose each option

By the end of this article, you'll have a clear understanding of which network policy solution best fits your requirements and how to implement it in your AKS cluster.

Understanding Kubernetes Network Policies

Before diving into the specific implementations, it's important to understand what Kubernetes network policies are and how they work.

What are Network Policies?

Kubernetes Network Policies are specifications that define how groups of pods can communicate with each other and with other network endpoints. They operate at Layer 3 and Layer 4 of the OSI model, controlling traffic based on:

  • Pod selectors: Which pods the policy applies to
  • Ingress rules: What incoming traffic is allowed
  • Egress rules: What outgoing traffic is allowed
  • Namespaces: Scope of the policy application
  • IP blocks: CIDR ranges for external endpoints

Default Behavior

By default, Kubernetes allows all traffic between all pods in a cluster. This "allow-all" behavior is convenient for development but poses security risks in production environments. Network policies enable you to implement a "deny-all-by-default" approach, explicitly allowing only necessary traffic.

Basic Network Policy Example

Here's a simple example of a Kubernetes network policy:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: frontend-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      role: frontend
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          role: backend
    ports:
    - protocol: TCP
      port: 8080
  egress:
  - to:
    - podSelector:
        matchLabels:
          role: backend
    ports:
    - protocol: TCP
      port: 3306
Enter fullscreen mode Exit fullscreen mode

This policy applies to pods with the label role: frontend and:

  • Allows ingress traffic only from pods labeled role: backend on port 8080
  • Allows egress traffic only to pods labeled role: backend on port 3306

AKS Network Policy Options

1. Azure Network Policy Manager (Azure NPM)

Azure Network Policy Manager is Microsoft's native implementation of Kubernetes network policies for AKS. It's built on Azure's Virtual Network capabilities and integrates tightly with Azure networking infrastructure.

Architecture

Azure NPM uses Azure Virtual Network capabilities to implement network policies:

  • Translates Kubernetes network policies into Azure Network Security Group (NSG) rules
  • Leverages Azure's data plane for policy enforcement
  • Operates at the node level using iptables rules

Key Features

Advantages:

  • Native Azure integration: Seamless integration with Azure services and networking
  • No additional components: Built into AKS, no separate installation required
  • Azure support: Fully supported by Microsoft Azure support
  • Basic functionality: Covers standard Kubernetes network policy use cases
  • Simple setup: Easy to enable on new or existing clusters
  • Cost-effective: No additional licensing costs

Limitations:

  • Basic feature set: Limited to standard Kubernetes network policies
  • No advanced features: Lacks features like DNS-based policies, application-layer filtering
  • Performance considerations: May have overhead due to iptables implementation
  • Limited observability: Basic logging and monitoring capabilities
  • IPv4 only: Currently supports only IPv4 addresses

When to Use Azure NPM

Azure Network Policy Manager is ideal when:

  • You need basic network policy functionality
  • You want Microsoft support for the entire networking stack
  • You're building a new cluster and want to start simple
  • Your team is already familiar with Azure networking concepts
  • You don't require advanced features like DNS-based policies
  • You want to minimize third-party dependencies

Enabling Azure NPM

To create an AKS cluster with Azure Network Policy Manager:

# Create a new AKS cluster with Azure NPM
az aks create \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --network-plugin azure \
  --network-policy azure \
  --node-count 3
Enter fullscreen mode Exit fullscreen mode

Explanation: Creates an AKS cluster with:

  • --network-plugin azure: Uses Azure CNI networking (required for Azure NPM)
  • --network-policy azure: Enables Azure Network Policy Manager
  • --node-count 3: Creates a 3-node cluster for high availability

Terraform equivalent:

resource "azurerm_kubernetes_cluster" "main" {
  name                = "myAKSCluster"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  dns_prefix          = "myakscluster"

  default_node_pool {
    name       = "default"
    node_count = 3
    vm_size    = "Standard_D2_v2"
  }

  network_profile {
    network_plugin = "azure"
    network_policy = "azure"
  }

  identity {
    type = "SystemAssigned"
  }
}
Enter fullscreen mode Exit fullscreen mode

For existing clusters, you cannot change the network policy after creation. You would need to create a new cluster with the desired network policy.

Example Azure NPM Policy

# Deny all ingress traffic by default
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
---
# Allow specific traffic to database
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-app-to-db
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: database
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: backend
    ports:
    - protocol: TCP
      port: 5432
Enter fullscreen mode Exit fullscreen mode

2. Calico Network Policy

Calico is one of the most popular and mature network policy implementations in the Kubernetes ecosystem. It's an open-source project maintained by Tigera that provides both networking and network policy capabilities.

Architecture

Calico uses a different approach compared to Azure NPM:

  • Felix agent runs on each node to enforce policies
  • Uses eBPF or iptables for data plane enforcement
  • etcd or Kubernetes API server for policy storage
  • BGP for routing (when used as a CNI)

Key Features

Advantages:

  • Rich feature set: Advanced features beyond standard Kubernetes network policies
  • High performance: eBPF data plane option for better performance
  • DNS-based policies: Can create policies based on DNS names
  • Global network policies: Apply policies across all namespaces
  • Network sets: Define reusable IP/CIDR groups
  • Observability: Built-in flow logs and detailed metrics
  • Encryption: Support for WireGuard encryption
  • Enterprise support: Commercial support available from Tigera
  • Mature and proven: Used in production by many organizations
  • Active community: Large community and extensive documentation

Advanced Features:

  • Application Layer Policies: Layer 7 policy enforcement
  • Hierarchical policies: Create policy tiers with different priorities
  • Service Graph: Visual representation of service-to-service communications
  • Compliance reporting: Built-in compliance reports
  • Threat detection: Integration with threat intelligence feeds

Limitations:

  • Additional complexity: More components to manage and monitor
  • Learning curve: Requires understanding Calico-specific concepts
  • Resource overhead: Additional pods and agents consume cluster resources
  • Updates: Need to manage Calico version updates separately from AKS

When to Use Calico

Calico is ideal when:

  • You need advanced network policy features beyond standard Kubernetes
  • You require DNS-based or FQDN-based policies
  • You want detailed network flow logs and observability
  • You need to implement zero-trust network architecture
  • You require encryption for pod-to-pod communication
  • You have compliance requirements that need detailed auditing
  • Your team has experience with Calico or can invest in learning it

Installing Calico on AKS

To create an AKS cluster with Calico:

# Create AKS cluster with Calico network policy
az aks create \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --network-plugin azure \
  --network-policy calico \
  --node-count 3
Enter fullscreen mode Exit fullscreen mode

Explanation: Creates an AKS cluster with:

  • --network-plugin azure: Uses Azure CNI for networking
  • --network-policy calico: Enables Calico for network policy enforcement
  • Calico will be installed as DaemonSets in the kube-system namespace

Terraform equivalent:

resource "azurerm_kubernetes_cluster" "main" {
  name                = "myAKSCluster"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  dns_prefix          = "myakscluster"

  default_node_pool {
    name       = "default"
    node_count = 3
    vm_size    = "Standard_D2_v2"
  }

  network_profile {
    network_plugin = "azure"
    network_policy = "calico"
  }

  identity {
    type = "SystemAssigned"
  }
}
Enter fullscreen mode Exit fullscreen mode

Verify Calico installation:

# Check Calico pods
kubectl get pods -n kube-system | grep calico
Enter fullscreen mode Exit fullscreen mode

Explanation: Lists all pods in the kube-system namespace that contain "calico" in their name. You should see calico-node pods running on each node.

Example Calico Policies

Standard Kubernetes Network Policy:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-backend
  namespace: production
spec:
  podSelector:
    matchLabels:
      tier: backend
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          tier: frontend
    ports:
    - protocol: TCP
      port: 8080
Enter fullscreen mode Exit fullscreen mode

Calico Global Network Policy:

apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
  name: deny-egress-external
spec:
  selector: has(role)
  types:
  - Egress
  egress:
  # Allow DNS
  - action: Allow
    protocol: UDP
    destination:
      ports:
      - 53
  # Allow internal cluster traffic
  - action: Allow
    destination:
      nets:
      - 10.0.0.0/8
  # Deny everything else
  - action: Deny
Enter fullscreen mode Exit fullscreen mode

DNS-based Policy:

apiVersion: projectcalico.org/v3
kind: NetworkPolicy
metadata:
  name: allow-external-apis
  namespace: production
spec:
  selector: app == 'backend'
  types:
  - Egress
  egress:
  # Allow access to specific external services by DNS
  - action: Allow
    protocol: TCP
    destination:
      domains:
      - "api.stripe.com"
      - "*.azure.com"
      ports:
      - 443
Enter fullscreen mode Exit fullscreen mode

Network Sets for IP Management:

apiVersion: projectcalico.org/v3
kind: GlobalNetworkSet
metadata:
  name: allowed-external-ips
spec:
  nets:
  - 203.0.113.0/24
  - 198.51.100.0/24
---
apiVersion: projectcalico.org/v3
kind: NetworkPolicy
metadata:
  name: allow-to-external-ips
  namespace: production
spec:
  selector: app == 'web'
  types:
  - Egress
  egress:
  - action: Allow
    destination:
      selector: global() == 'allowed-external-ips'
Enter fullscreen mode Exit fullscreen mode

Calico Observability

Calico provides enhanced observability features:

# Install calicoctl (Calico CLI tool)
curl -L https://github.com/projectcalico/calico/releases/latest/download/calicoctl-linux-amd64 -o calicoctl
chmod +x calicoctl
sudo mv calicoctl /usr/local/bin/

# View network policy status
calicoctl get networkpolicy -A -o wide

# View global network policies
calicoctl get globalnetworkpolicy -o wide

# Check policy order and priority
calicoctl get networkpolicy --all-namespaces -o yaml
Enter fullscreen mode Exit fullscreen mode

Explanation: These commands help you inspect and troubleshoot Calico network policies. calicoctl is the command-line tool for managing Calico resources.

Enable flow logs:

apiVersion: projectcalico.org/v3
kind: FelixConfiguration
metadata:
  name: default
spec:
  flowLogsEnableHostEndpoint: true
  flowLogsFileEnabled: true
  flowLogsFileIncludeLabels: true
  flowLogsFileIncludePolicies: true
  dnsLogsFileEnabled: true
Enter fullscreen mode Exit fullscreen mode

3. Cilium Network Policy

Cilium is a modern networking and security solution built on eBPF (extended Berkeley Packet Filter) technology. It's gaining popularity due to its high performance and advanced features.

Architecture

Cilium's architecture is fundamentally different from traditional network policy implementations:

  • eBPF-based: Uses eBPF programs loaded into the Linux kernel for packet filtering
  • Identity-based: Uses security identities instead of IP addresses
  • API-aware: Can filter based on HTTP, gRPC, and other application protocols
  • Cilium Agent: Runs on each node to manage eBPF programs
  • Cilium Operator: Manages cluster-wide resources

Key Features

Advantages:

  • Exceptional performance: eBPF provides near-native network performance
  • Layer 7 policies: Filter traffic based on HTTP methods, paths, headers
  • API-aware security: Understand and filter application-layer protocols
  • Identity-based: More flexible than IP-based policies
  • Service mesh capabilities: Can replace or complement service meshes
  • Network visibility: Deep network and application visibility with Hubble
  • Transparent encryption: Automatic encryption of all pod traffic
  • Multi-cluster: Native support for multi-cluster networking
  • Modern architecture: Built for cloud-native from the ground up
  • Growing ecosystem: Rapid development and feature additions

Advanced Features:

  • Hubble: Network and security observability platform
  • BGP support: Advanced routing capabilities
  • Bandwidth management: QoS and traffic shaping
  • Network policies for services: Apply policies to Kubernetes services
  • Kafka/DNS protocol enforcement: Application-specific policy enforcement

Limitations:

  • Kernel requirements: Requires relatively modern Linux kernels (4.9+)
  • Complexity: More complex architecture and concepts
  • Maturity: Newer than Calico, less production track record in AKS
  • Manual installation: Not natively supported by AKS, requires manual installation
  • Learning curve: Requires understanding of eBPF and Cilium concepts
  • Troubleshooting: Can be more difficult to debug eBPF issues

When to Use Cilium

Cilium is ideal when:

  • You need maximum network performance
  • You require Layer 7 (HTTP/gRPC) policy enforcement
  • You want modern, API-aware security
  • You need advanced observability with Hubble
  • You're building microservices that need fine-grained policies
  • You want to implement a zero-trust network architecture
  • You have the expertise to manage a more complex system
  • Performance is a critical requirement

Installing Cilium on AKS

Cilium is not natively supported by AKS through the --network-policy flag, so it requires manual installation. Here's how to install it:

Step 1: Create AKS cluster without network policy

# Create AKS cluster with Azure CNI, no network policy yet
az aks create \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --network-plugin azure \
  --node-count 3 \
  --generate-ssh-keys
Enter fullscreen mode Exit fullscreen mode

Explanation: Creates an AKS cluster with Azure CNI networking but without any network policy engine. We'll install Cilium afterward.

Terraform equivalent:

resource "azurerm_kubernetes_cluster" "main" {
  name                = "myAKSCluster"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  dns_prefix          = "myakscluster"

  default_node_pool {
    name       = "default"
    node_count = 3
    vm_size    = "Standard_D2_v2"
  }

  network_profile {
    network_plugin = "azure"
    # No network_policy specified
  }

  identity {
    type = "SystemAssigned"
  }
}
Enter fullscreen mode Exit fullscreen mode

Step 2: Install Cilium using Helm

# Get cluster credentials
az aks get-credentials --resource-group myResourceGroup --name myAKSCluster
Enter fullscreen mode Exit fullscreen mode

Explanation: Downloads the cluster configuration and credentials to your local kubeconfig file.

# Add Cilium Helm repository
helm repo add cilium https://helm.cilium.io/
helm repo update
Enter fullscreen mode Exit fullscreen mode

Explanation: Adds the Cilium Helm chart repository and updates the local cache of chart information.

# Install Cilium
helm install cilium cilium/cilium \
  --version 1.14.5 \
  --namespace kube-system \
  --set azure.enabled=true \
  --set azure.resourceGroup=myResourceGroup \
  --set azure.subscriptionID="YOUR_SUBSCRIPTION_ID" \
  --set azure.tenantID="YOUR_TENANT_ID" \
  --set tunnel=disabled \
  --set ipam.mode=azure \
  --set enableIPv4Masquerade=false \
  --set nodeinit.enabled=true
Enter fullscreen mode Exit fullscreen mode

Explanation: Installs Cilium with Azure-specific configurations:

  • azure.enabled=true: Enables Azure integration
  • tunnel=disabled: Uses Azure routing instead of overlay networking
  • ipam.mode=azure: Uses Azure CNI for IP address management
  • enableIPv4Masquerade=false: Disables masquerading since Azure handles routing
  • nodeinit.enabled=true: Initializes nodes with required configurations

Terraform equivalent (using Helm provider):

resource "helm_release" "cilium" {
  name       = "cilium"
  repository = "https://helm.cilium.io/"
  chart      = "cilium"
  version    = "1.14.5"
  namespace  = "kube-system"

  set {
    name  = "azure.enabled"
    value = "true"
  }

  set {
    name  = "azure.resourceGroup"
    value = azurerm_resource_group.main.name
  }

  set {
    name  = "azure.subscriptionID"
    value = data.azurerm_subscription.current.subscription_id
  }

  set {
    name  = "azure.tenantID"
    value = data.azurerm_subscription.current.tenant_id
  }

  set {
    name  = "tunnel"
    value = "disabled"
  }

  set {
    name  = "ipam.mode"
    value = "azure"
  }

  set {
    name  = "enableIPv4Masquerade"
    value = "false"
  }

  set {
    name  = "nodeinit.enabled"
    value = "true"
  }

  depends_on = [azurerm_kubernetes_cluster.main]
}

data "azurerm_subscription" "current" {}
Enter fullscreen mode Exit fullscreen mode

Step 3: Verify installation

# Check Cilium pods
kubectl get pods -n kube-system -l k8s-app=cilium

# Check Cilium status
cilium status --wait

# Run connectivity test
cilium connectivity test
Enter fullscreen mode Exit fullscreen mode

Explanation: These commands verify that Cilium is installed correctly and running on all nodes. The connectivity test performs comprehensive checks of network functionality.

Example Cilium Policies

Standard Kubernetes Network Policy:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-backend
  namespace: production
spec:
  podSelector:
    matchLabels:
      tier: backend
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          tier: frontend
    ports:
    - protocol: TCP
      port: 8080
Enter fullscreen mode Exit fullscreen mode

Cilium Layer 7 HTTP Policy:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-specific-http-methods
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: api-server
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: frontend
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
      rules:
        http:
        - method: "GET"
          path: "/api/.*"
        - method: "POST"
          path: "/api/users"
Enter fullscreen mode Exit fullscreen mode

DNS/FQDN-based Policy:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-external-apis
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: backend
  egress:
  - toFQDNs:
    - matchPattern: "*.azure.com"
    - matchName: "api.github.com"
    toPorts:
    - ports:
      - port: "443"
        protocol: TCP
Enter fullscreen mode Exit fullscreen mode

Layer 7 gRPC Policy:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: grpc-policy
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: grpc-server
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: grpc-client
    toPorts:
    - ports:
      - port: "50051"
        protocol: TCP
      rules:
        http:
        - method: "POST"
          path: "/order.OrderService/GetOrder"
        - method: "POST"
          path: "/order.OrderService/CreateOrder"
        - method: "POST"
          path: "/order.OrderService/ListOrders"
Enter fullscreen mode Exit fullscreen mode

Service-based Policy:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-to-service
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: client
  egress:
  - toServices:
    - k8sService:
        serviceName: backend-service
        namespace: production
Enter fullscreen mode Exit fullscreen mode

Cilium Observability with Hubble

Hubble is Cilium's observability platform:

# Enable Hubble
helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --reuse-values \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true
Enter fullscreen mode Exit fullscreen mode

Explanation: Enables Hubble UI and relay for network observability. This provides a graphical interface to visualize network traffic and policies.

# Install Hubble CLI
export HUBBLE_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/hubble/master/stable.txt)
curl -L --remote-name-all https://github.com/cilium/hubble/releases/download/$HUBBLE_VERSION/hubble-linux-amd64.tar.gz{,.sha256sum}
sha256sum --check hubble-linux-amd64.tar.gz.sha256sum
sudo tar xzvfC hubble-linux-amd64.tar.gz /usr/local/bin
Enter fullscreen mode Exit fullscreen mode

Explanation: Downloads and installs the Hubble CLI tool for querying network flows from the command line.

# Port forward to Hubble UI
kubectl port-forward -n kube-system svc/hubble-ui 12000:80
Enter fullscreen mode Exit fullscreen mode

Explanation: Makes the Hubble UI accessible at http://localhost:12000 for visualizing network traffic.

Query network flows:

# Observe flows in real-time
hubble observe

# Observe flows for specific namespace
hubble observe --namespace production

# Observe flows with specific labels
hubble observe --from-label app=frontend --to-label app=backend

# Observe denied flows
hubble observe --verdict DROPPED
Enter fullscreen mode Exit fullscreen mode

Comparison Matrix

Here's a comprehensive comparison of the three network policy options:

Feature Azure NPM Calico Cilium
Native AKS Support ✅ Yes ✅ Yes ❌ No (manual install)
Ease of Setup ⭐⭐⭐⭐⭐ Very Easy ⭐⭐⭐⭐ Easy ⭐⭐⭐ Moderate
Standard K8s Policies ✅ Yes ✅ Yes ✅ Yes
Layer 7 Policies ❌ No ⚠️ Limited ✅ Full support
DNS/FQDN Policies ❌ No ✅ Yes ✅ Yes
Global Policies ❌ No ✅ Yes ✅ Yes
Performance ⭐⭐⭐ Good ⭐⭐⭐⭐ Very Good ⭐⭐⭐⭐⭐ Excellent
Observability ⭐⭐ Basic ⭐⭐⭐⭐ Advanced ⭐⭐⭐⭐⭐ Superior
Learning Curve ⭐⭐⭐⭐⭐ Easy ⭐⭐⭐ Moderate ⭐⭐ Steep
Enterprise Support ✅ Microsoft ✅ Tigera ✅ Isovalent
Encryption ❌ No ✅ WireGuard ✅ IPSec/WireGuard
Multi-cluster ❌ Limited ✅ Yes ✅ Yes
Service Mesh Features ❌ No ⚠️ Limited ✅ Yes
Resource Usage ⭐⭐⭐⭐ Low ⭐⭐⭐ Moderate ⭐⭐⭐ Moderate
Maturity in AKS ⭐⭐⭐⭐⭐ High ⭐⭐⭐⭐⭐ High ⭐⭐⭐ Growing
Community Azure community ⭐⭐⭐⭐⭐ Large ⭐⭐⭐⭐ Growing
Cost Free Free (OSS) Free (OSS)

Implementation Best Practices

1. Start with Deny-All Policy

Regardless of which solution you choose, start with a default deny-all policy:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
Enter fullscreen mode Exit fullscreen mode

This creates a secure-by-default posture where you explicitly allow only necessary traffic.

2. Implement Least Privilege Access

Only allow the minimum necessary access:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: backend-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
  - Ingress
  - Egress
  ingress:
  # Only allow from frontend
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080
  egress:
  # Only allow to database
  - to:
    - podSelector:
        matchLabels:
          app: database
    ports:
    - protocol: TCP
      port: 5432
  # Allow DNS
  - to:
    - namespaceSelector:
        matchLabels:
          name: kube-system
    ports:
    - protocol: UDP
      port: 53
Enter fullscreen mode Exit fullscreen mode

3. Use Namespaces for Isolation

Organize workloads into namespaces and use namespace selectors:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-same-namespace
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector: {}
Enter fullscreen mode Exit fullscreen mode

4. Always Allow Essential Traffic

Don't forget to allow essential cluster traffic:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Egress
  egress:
  # Allow DNS
  - to:
    - namespaceSelector:
        matchLabels:
          name: kube-system
    - podSelector:
        matchLabels:
          k8s-app: kube-dns
    ports:
    - protocol: UDP
      port: 53
Enter fullscreen mode Exit fullscreen mode

5. Label Resources Consistently

Use consistent labeling for effective policy management:

# Good labeling strategy
metadata:
  labels:
    app: backend
    tier: api
    environment: production
    team: platform
Enter fullscreen mode Exit fullscreen mode

6. Test Policies in Non-Production First

Always test network policies in development/staging before production:

# Create test namespace
kubectl create namespace policy-test

# Deploy test application
kubectl apply -f test-app.yaml -n policy-test

# Apply policy
kubectl apply -f test-policy.yaml -n policy-test

# Test connectivity
kubectl exec -n policy-test test-pod -- curl http://backend-service
Enter fullscreen mode Exit fullscreen mode

7. Monitor and Audit Policies

Implement monitoring for policy violations:

# For Calico - view denied flows
calicoctl get felixconfiguration default -o yaml

# For Cilium - observe dropped packets
hubble observe --verdict DROPPED

# Check policy logs in Azure Monitor
az monitor log-analytics query \
  --workspace YOUR_WORKSPACE_ID \
  --analytics-query "KubePodInventory | where Namespace == 'production'"
Enter fullscreen mode Exit fullscreen mode

8. Document Your Policies

Maintain documentation for your network policies:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: frontend-policy
  namespace: production
  annotations:
    description: "Allows frontend pods to communicate with backend API"
    owner: "platform-team@company.com"
    reviewed: "2024-01-15"
spec:
  # ... policy spec
Enter fullscreen mode Exit fullscreen mode

Testing Network Policies

Basic Connectivity Testing

Create test pods to verify policy enforcement:

# test-client.yaml
apiVersion: v1
kind: Pod
metadata:
  name: test-client
  namespace: production
  labels:
    app: test
spec:
  containers:
  - name: netshoot
    image: nicolaka/netshoot
    command: ["sleep", "3600"]
---
# test-server.yaml
apiVersion: v1
kind: Pod
metadata:
  name: test-server
  namespace: production
  labels:
    app: backend
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
Enter fullscreen mode Exit fullscreen mode

Test connectivity:

# Deploy test pods
kubectl apply -f test-client.yaml
kubectl apply -f test-server.yaml

# Get server IP
SERVER_IP=$(kubectl get pod test-server -n production -o jsonpath='{.status.podIP}')

# Test before policy (should work)
kubectl exec -n production test-client -- curl -m 5 http://$SERVER_IP

# Apply restrictive policy
kubectl apply -f restrictive-policy.yaml

# Test after policy (should fail if policy blocks it)
kubectl exec -n production test-client -- curl -m 5 http://$SERVER_IP
Enter fullscreen mode Exit fullscreen mode

Policy Validation Tools

Use policy validation tools:

# For Calico - policy validation
calicoctl get networkpolicy --all-namespaces
calicoctl get globalnetworkpolicy

# For Cilium - policy validation
cilium policy validate
cilium endpoint list

# Check policy effectiveness
kubectl get networkpolicy --all-namespaces
kubectl describe networkpolicy <policy-name> -n <namespace>
Enter fullscreen mode Exit fullscreen mode

Troubleshooting Network Policies

Common Issues and Solutions

Issue 1: Policy Not Being Enforced

# Check if network policy is applied
kubectl get networkpolicy -n <namespace>

# Describe the policy
kubectl describe networkpolicy <policy-name> -n <namespace>

# For Calico - check Felix logs
kubectl logs -n kube-system -l k8s-app=calico-node

# For Cilium - check agent logs
kubectl logs -n kube-system -l k8s-app=cilium
Enter fullscreen mode Exit fullscreen mode

Issue 2: Pods Cannot Connect

# Verify pod labels
kubectl get pods --show-labels -n <namespace>

# Check if pod matches policy selector
kubectl get networkpolicy <policy-name> -n <namespace> -o yaml

# Test DNS resolution
kubectl exec <pod-name> -n <namespace> -- nslookup kubernetes.default

# Check if DNS is allowed in policy
kubectl get networkpolicy -n <namespace> -o yaml | grep -A 10 egress
Enter fullscreen mode Exit fullscreen mode

Issue 3: Performance Degradation

# For Calico - check Felix performance metrics
kubectl get felixconfiguration default -o yaml

# For Cilium - check eBPF program performance
cilium metrics list

# Check node resource usage
kubectl top nodes
kubectl top pods -n kube-system
Enter fullscreen mode Exit fullscreen mode

Debugging Tools

Network debugging pod:

apiVersion: v1
kind: Pod
metadata:
  name: netdebug
spec:
  containers:
  - name: netshoot
    image: nicolaka/netshoot
    command: ["sleep", "3600"]
Enter fullscreen mode Exit fullscreen mode
# Deploy and use debugging pod
kubectl apply -f netdebug.yaml

# Available tools in netshoot:
kubectl exec netdebug -- ping <ip>
kubectl exec netdebug -- traceroute <ip>
kubectl exec netdebug -- nslookup <hostname>
kubectl exec netdebug -- curl -v <url>
kubectl exec netdebug -- tcpdump -i any port 80
Enter fullscreen mode Exit fullscreen mode

Migration Strategies

Migrating from No Policy to Network Policies

Step 1: Audit existing traffic

# Enable flow logs (if using Calico or Cilium)
# For Calico:
kubectl patch felixconfiguration default --type merge -p '{"spec":{"flowLogsEnableHostEndpoint":true}}'

# For Cilium with Hubble:
hubble observe --all
Enter fullscreen mode Exit fullscreen mode

Step 2: Create allow-all policy

# Baseline policy that allows everything (no disruption)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-all-temporary
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - {}
  egress:
  - {}
Enter fullscreen mode Exit fullscreen mode

Step 3: Gradually restrict

# Phase 1: Deny all, but allow same namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-same-namespace
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector: {}
---
# Phase 2: Add specific policies for each service
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: backend-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  # ... specific rules
Enter fullscreen mode Exit fullscreen mode

Migrating Between Policy Engines

From Azure NPM to Calico:

  1. Create a new cluster with Calico
  2. Migrate workloads using blue-green deployment
  3. Validate policies work correctly
  4. Switch traffic to new cluster
  5. Decommission old cluster

From Calico to Cilium:

  1. Back up existing Calico policies
  2. Install Cilium alongside Calico (if possible for testing)
  3. Convert policies to Cilium format
  4. Test thoroughly in non-production environment
  5. Perform controlled migration

Real-World Use Cases

Use Case 1: Multi-Tier Web Application

Architecture:

  • Frontend (React app)
  • Backend API (Node.js)
  • Database (PostgreSQL)
  • Redis cache

Network Policy Strategy:

# Frontend can access backend API only
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: frontend-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      tier: frontend
  policyTypes:
  - Egress
  egress:
  - to:
    - podSelector:
        matchLabels:
          tier: backend
    ports:
    - protocol: TCP
      port: 3000
  - to:
    - namespaceSelector: {}
      podSelector:
        matchLabels:
          k8s-app: kube-dns
    ports:
    - protocol: UDP
      port: 53
---
# Backend can access database and Redis only
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: backend-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      tier: backend
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          tier: frontend
    ports:
    - protocol: TCP
      port: 3000
  egress:
  - to:
    - podSelector:
        matchLabels:
          tier: database
    ports:
    - protocol: TCP
      port: 5432
  - to:
    - podSelector:
        matchLabels:
          tier: cache
    ports:
    - protocol: TCP
      port: 6379
  - to:
    - namespaceSelector: {}
      podSelector:
        matchLabels:
          k8s-app: kube-dns
    ports:
    - protocol: UDP
      port: 53
---
# Database accepts connections from backend only
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: database-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      tier: database
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          tier: backend
    ports:
    - protocol: TCP
      port: 5432
Enter fullscreen mode Exit fullscreen mode

Use Case 2: Microservices with Service Mesh

For microservices architectures with Cilium:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: order-service-policy
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: order-service
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: api-gateway
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
      rules:
        http:
        - method: "GET"
          path: "/api/orders/.*"
        - method: "POST"
          path: "/api/orders"
        - method: "PUT"
          path: "/api/orders/.*"
  egress:
  - toEndpoints:
    - matchLabels:
        app: inventory-service
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
      rules:
        http:
        - method: "GET"
          path: "/api/inventory/.*"
  - toFQDNs:
    - matchPattern: "*.database.azure.com"
    toPorts:
    - ports:
      - port: "5432"
        protocol: TCP
Enter fullscreen mode Exit fullscreen mode

Use Case 3: Compliance and Regulatory Requirements

For environments requiring strict compliance:

# PCI-DSS compliant policy for payment service
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: payment-service-policy
  namespace: pci-compliant
  annotations:
    compliance: "PCI-DSS v3.2.1"
    description: "Isolates payment processing workloads"
spec:
  podSelector:
    matchLabels:
      compliance: pci-dss
  policyTypes:
  - Ingress
  - Egress
  ingress:
  # Only allow from authenticated API gateway
  - from:
    - podSelector:
        matchLabels:
          app: api-gateway
          auth: enabled
    ports:
    - protocol: TCP
      port: 8443
  egress:
  # Only allow to payment provider APIs
  - to:
    - podSelector:
        matchLabels:
          type: payment-provider
    ports:
    - protocol: TCP
      port: 443
  # Allow logging to compliance log collector
  - to:
    - podSelector:
        matchLabels:
          app: log-collector
          compliance: enabled
    ports:
    - protocol: TCP
      port: 9200
Enter fullscreen mode Exit fullscreen mode

Conclusion

Choosing the right network policy solution for your AKS cluster is a critical decision that impacts security, performance, and operational complexity. Let's summarize the key considerations:

Quick Decision Guide

Choose Azure Network Policy Manager if:

  • You're new to Kubernetes network policies
  • You want simple, Microsoft-supported solution
  • You need only basic network policy features
  • You prefer minimal operational overhead
  • Your team is already familiar with Azure networking

Choose Calico if:

  • You need advanced network policy features
  • You require DNS/FQDN-based policies
  • You need detailed observability and logging
  • You want a mature, proven solution
  • You have compliance requirements needing extensive auditing
  • You may need to migrate to other cloud providers or on-premises

Choose Cilium if:

  • Performance is a critical requirement
  • You need Layer 7 HTTP/gRPC policy enforcement
  • You want modern, API-aware security
  • You need advanced observability with Hubble
  • You're building a microservices architecture
  • You have expertise to manage a more complex system
  • You want cutting-edge technology and features

Key Takeaways

  1. Start Simple: Begin with basic policies and gradually increase complexity as you understand your traffic patterns

  2. Test Thoroughly: Always test network policies in non-production environments before applying to production

  3. Monitor Continuously: Implement comprehensive monitoring to detect policy violations and unexpected behavior

  4. Document Everything: Maintain clear documentation of your network policies and their purposes

  5. Plan for Scale: Consider how your policy solution will scale as your cluster grows

  6. Security First: Implement a deny-by-default strategy and explicitly allow only necessary traffic

  7. Regular Audits: Periodically review and update network policies to ensure they remain relevant and effective

Future Trends

The Kubernetes network policy ecosystem continues to evolve:

  • eBPF adoption: More solutions leveraging eBPF for better performance
  • Service mesh integration: Tighter integration between network policies and service meshes
  • Zero trust networking: Enhanced support for zero-trust security models
  • Multi-cluster policies: Better support for policies spanning multiple clusters
  • AI/ML-based policies: Intelligent policies that adapt based on traffic patterns

Regardless of which solution you choose, implementing network policies is essential for securing your AKS workloads. By following the best practices outlined in this article and choosing the solution that best fits your requirements, you can build a secure, compliant, and performant Kubernetes environment.

Remember: network policies are just one layer of security. Combine them with other security measures like RBAC, Pod Security Standards, image scanning, and secret management for defense-in-depth security.

References

Top comments (0)