Network security is a critical component of any Kubernetes deployment, especially in enterprise environments where regulatory compliance and data protection are paramount. Network policies in Kubernetes provide a way to control traffic flow between pods and network endpoints at the application layer, similar to how security groups and firewalls work for virtual machines.
Azure Kubernetes Service (AKS) supports multiple network policy implementations, each with its own strengths, features, and use cases. Understanding the differences between these options is crucial for designing secure, performant, and compliant Kubernetes workloads.
In this article, we'll explore the different network policy options available for AKS:
- Azure Network Policy Manager (Azure NPM): Microsoft's native solution for basic network policies
- Calico: A popular open-source network policy engine with advanced features
- Cilium: A modern, eBPF-based networking and security solution
We'll compare these options across various dimensions including:
- Feature sets: What capabilities each solution provides
- Performance: How each solution impacts cluster performance
- Ease of use: Installation, configuration, and management complexity
- Integration: How well each solution integrates with Azure services
- Use cases: When to choose each option
By the end of this article, you'll have a clear understanding of which network policy solution best fits your requirements and how to implement it in your AKS cluster.
Understanding Kubernetes Network Policies
Before diving into the specific implementations, it's important to understand what Kubernetes network policies are and how they work.
What are Network Policies?
Kubernetes Network Policies are specifications that define how groups of pods can communicate with each other and with other network endpoints. They operate at Layer 3 and Layer 4 of the OSI model, controlling traffic based on:
- Pod selectors: Which pods the policy applies to
- Ingress rules: What incoming traffic is allowed
- Egress rules: What outgoing traffic is allowed
- Namespaces: Scope of the policy application
- IP blocks: CIDR ranges for external endpoints
Default Behavior
By default, Kubernetes allows all traffic between all pods in a cluster. This "allow-all" behavior is convenient for development but poses security risks in production environments. Network policies enable you to implement a "deny-all-by-default" approach, explicitly allowing only necessary traffic.
Basic Network Policy Example
Here's a simple example of a Kubernetes network policy:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: frontend-policy
namespace: production
spec:
podSelector:
matchLabels:
role: frontend
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
role: backend
ports:
- protocol: TCP
port: 8080
egress:
- to:
- podSelector:
matchLabels:
role: backend
ports:
- protocol: TCP
port: 3306
This policy applies to pods with the label role: frontend and:
- Allows ingress traffic only from pods labeled
role: backendon port 8080 - Allows egress traffic only to pods labeled
role: backendon port 3306
AKS Network Policy Options
1. Azure Network Policy Manager (Azure NPM)
Azure Network Policy Manager is Microsoft's native implementation of Kubernetes network policies for AKS. It's built on Azure's Virtual Network capabilities and integrates tightly with Azure networking infrastructure.
Architecture
Azure NPM uses Azure Virtual Network capabilities to implement network policies:
- Translates Kubernetes network policies into Azure Network Security Group (NSG) rules
- Leverages Azure's data plane for policy enforcement
- Operates at the node level using iptables rules
Key Features
Advantages:
- Native Azure integration: Seamless integration with Azure services and networking
- No additional components: Built into AKS, no separate installation required
- Azure support: Fully supported by Microsoft Azure support
- Basic functionality: Covers standard Kubernetes network policy use cases
- Simple setup: Easy to enable on new or existing clusters
- Cost-effective: No additional licensing costs
Limitations:
- Basic feature set: Limited to standard Kubernetes network policies
- No advanced features: Lacks features like DNS-based policies, application-layer filtering
- Performance considerations: May have overhead due to iptables implementation
- Limited observability: Basic logging and monitoring capabilities
- IPv4 only: Currently supports only IPv4 addresses
When to Use Azure NPM
Azure Network Policy Manager is ideal when:
- You need basic network policy functionality
- You want Microsoft support for the entire networking stack
- You're building a new cluster and want to start simple
- Your team is already familiar with Azure networking concepts
- You don't require advanced features like DNS-based policies
- You want to minimize third-party dependencies
Enabling Azure NPM
To create an AKS cluster with Azure Network Policy Manager:
# Create a new AKS cluster with Azure NPM
az aks create \
--resource-group myResourceGroup \
--name myAKSCluster \
--network-plugin azure \
--network-policy azure \
--node-count 3
Explanation: Creates an AKS cluster with:
-
--network-plugin azure: Uses Azure CNI networking (required for Azure NPM) -
--network-policy azure: Enables Azure Network Policy Manager -
--node-count 3: Creates a 3-node cluster for high availability
Terraform equivalent:
resource "azurerm_kubernetes_cluster" "main" {
name = "myAKSCluster"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
dns_prefix = "myakscluster"
default_node_pool {
name = "default"
node_count = 3
vm_size = "Standard_D2_v2"
}
network_profile {
network_plugin = "azure"
network_policy = "azure"
}
identity {
type = "SystemAssigned"
}
}
For existing clusters, you cannot change the network policy after creation. You would need to create a new cluster with the desired network policy.
Example Azure NPM Policy
# Deny all ingress traffic by default
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
---
# Allow specific traffic to database
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-app-to-db
namespace: production
spec:
podSelector:
matchLabels:
app: database
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: backend
ports:
- protocol: TCP
port: 5432
2. Calico Network Policy
Calico is one of the most popular and mature network policy implementations in the Kubernetes ecosystem. It's an open-source project maintained by Tigera that provides both networking and network policy capabilities.
Architecture
Calico uses a different approach compared to Azure NPM:
- Felix agent runs on each node to enforce policies
- Uses eBPF or iptables for data plane enforcement
- etcd or Kubernetes API server for policy storage
- BGP for routing (when used as a CNI)
Key Features
Advantages:
- Rich feature set: Advanced features beyond standard Kubernetes network policies
- High performance: eBPF data plane option for better performance
- DNS-based policies: Can create policies based on DNS names
- Global network policies: Apply policies across all namespaces
- Network sets: Define reusable IP/CIDR groups
- Observability: Built-in flow logs and detailed metrics
- Encryption: Support for WireGuard encryption
- Enterprise support: Commercial support available from Tigera
- Mature and proven: Used in production by many organizations
- Active community: Large community and extensive documentation
Advanced Features:
- Application Layer Policies: Layer 7 policy enforcement
- Hierarchical policies: Create policy tiers with different priorities
- Service Graph: Visual representation of service-to-service communications
- Compliance reporting: Built-in compliance reports
- Threat detection: Integration with threat intelligence feeds
Limitations:
- Additional complexity: More components to manage and monitor
- Learning curve: Requires understanding Calico-specific concepts
- Resource overhead: Additional pods and agents consume cluster resources
- Updates: Need to manage Calico version updates separately from AKS
When to Use Calico
Calico is ideal when:
- You need advanced network policy features beyond standard Kubernetes
- You require DNS-based or FQDN-based policies
- You want detailed network flow logs and observability
- You need to implement zero-trust network architecture
- You require encryption for pod-to-pod communication
- You have compliance requirements that need detailed auditing
- Your team has experience with Calico or can invest in learning it
Installing Calico on AKS
To create an AKS cluster with Calico:
# Create AKS cluster with Calico network policy
az aks create \
--resource-group myResourceGroup \
--name myAKSCluster \
--network-plugin azure \
--network-policy calico \
--node-count 3
Explanation: Creates an AKS cluster with:
-
--network-plugin azure: Uses Azure CNI for networking -
--network-policy calico: Enables Calico for network policy enforcement - Calico will be installed as DaemonSets in the kube-system namespace
Terraform equivalent:
resource "azurerm_kubernetes_cluster" "main" {
name = "myAKSCluster"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
dns_prefix = "myakscluster"
default_node_pool {
name = "default"
node_count = 3
vm_size = "Standard_D2_v2"
}
network_profile {
network_plugin = "azure"
network_policy = "calico"
}
identity {
type = "SystemAssigned"
}
}
Verify Calico installation:
# Check Calico pods
kubectl get pods -n kube-system | grep calico
Explanation: Lists all pods in the kube-system namespace that contain "calico" in their name. You should see calico-node pods running on each node.
Example Calico Policies
Standard Kubernetes Network Policy:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-backend
namespace: production
spec:
podSelector:
matchLabels:
tier: backend
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
tier: frontend
ports:
- protocol: TCP
port: 8080
Calico Global Network Policy:
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
name: deny-egress-external
spec:
selector: has(role)
types:
- Egress
egress:
# Allow DNS
- action: Allow
protocol: UDP
destination:
ports:
- 53
# Allow internal cluster traffic
- action: Allow
destination:
nets:
- 10.0.0.0/8
# Deny everything else
- action: Deny
DNS-based Policy:
apiVersion: projectcalico.org/v3
kind: NetworkPolicy
metadata:
name: allow-external-apis
namespace: production
spec:
selector: app == 'backend'
types:
- Egress
egress:
# Allow access to specific external services by DNS
- action: Allow
protocol: TCP
destination:
domains:
- "api.stripe.com"
- "*.azure.com"
ports:
- 443
Network Sets for IP Management:
apiVersion: projectcalico.org/v3
kind: GlobalNetworkSet
metadata:
name: allowed-external-ips
spec:
nets:
- 203.0.113.0/24
- 198.51.100.0/24
---
apiVersion: projectcalico.org/v3
kind: NetworkPolicy
metadata:
name: allow-to-external-ips
namespace: production
spec:
selector: app == 'web'
types:
- Egress
egress:
- action: Allow
destination:
selector: global() == 'allowed-external-ips'
Calico Observability
Calico provides enhanced observability features:
# Install calicoctl (Calico CLI tool)
curl -L https://github.com/projectcalico/calico/releases/latest/download/calicoctl-linux-amd64 -o calicoctl
chmod +x calicoctl
sudo mv calicoctl /usr/local/bin/
# View network policy status
calicoctl get networkpolicy -A -o wide
# View global network policies
calicoctl get globalnetworkpolicy -o wide
# Check policy order and priority
calicoctl get networkpolicy --all-namespaces -o yaml
Explanation: These commands help you inspect and troubleshoot Calico network policies. calicoctl is the command-line tool for managing Calico resources.
Enable flow logs:
apiVersion: projectcalico.org/v3
kind: FelixConfiguration
metadata:
name: default
spec:
flowLogsEnableHostEndpoint: true
flowLogsFileEnabled: true
flowLogsFileIncludeLabels: true
flowLogsFileIncludePolicies: true
dnsLogsFileEnabled: true
3. Cilium Network Policy
Cilium is a modern networking and security solution built on eBPF (extended Berkeley Packet Filter) technology. It's gaining popularity due to its high performance and advanced features.
Architecture
Cilium's architecture is fundamentally different from traditional network policy implementations:
- eBPF-based: Uses eBPF programs loaded into the Linux kernel for packet filtering
- Identity-based: Uses security identities instead of IP addresses
- API-aware: Can filter based on HTTP, gRPC, and other application protocols
- Cilium Agent: Runs on each node to manage eBPF programs
- Cilium Operator: Manages cluster-wide resources
Key Features
Advantages:
- Exceptional performance: eBPF provides near-native network performance
- Layer 7 policies: Filter traffic based on HTTP methods, paths, headers
- API-aware security: Understand and filter application-layer protocols
- Identity-based: More flexible than IP-based policies
- Service mesh capabilities: Can replace or complement service meshes
- Network visibility: Deep network and application visibility with Hubble
- Transparent encryption: Automatic encryption of all pod traffic
- Multi-cluster: Native support for multi-cluster networking
- Modern architecture: Built for cloud-native from the ground up
- Growing ecosystem: Rapid development and feature additions
Advanced Features:
- Hubble: Network and security observability platform
- BGP support: Advanced routing capabilities
- Bandwidth management: QoS and traffic shaping
- Network policies for services: Apply policies to Kubernetes services
- Kafka/DNS protocol enforcement: Application-specific policy enforcement
Limitations:
- Kernel requirements: Requires relatively modern Linux kernels (4.9+)
- Complexity: More complex architecture and concepts
- Maturity: Newer than Calico, less production track record in AKS
- Manual installation: Not natively supported by AKS, requires manual installation
- Learning curve: Requires understanding of eBPF and Cilium concepts
- Troubleshooting: Can be more difficult to debug eBPF issues
When to Use Cilium
Cilium is ideal when:
- You need maximum network performance
- You require Layer 7 (HTTP/gRPC) policy enforcement
- You want modern, API-aware security
- You need advanced observability with Hubble
- You're building microservices that need fine-grained policies
- You want to implement a zero-trust network architecture
- You have the expertise to manage a more complex system
- Performance is a critical requirement
Installing Cilium on AKS
Cilium is not natively supported by AKS through the --network-policy flag, so it requires manual installation. Here's how to install it:
Step 1: Create AKS cluster without network policy
# Create AKS cluster with Azure CNI, no network policy yet
az aks create \
--resource-group myResourceGroup \
--name myAKSCluster \
--network-plugin azure \
--node-count 3 \
--generate-ssh-keys
Explanation: Creates an AKS cluster with Azure CNI networking but without any network policy engine. We'll install Cilium afterward.
Terraform equivalent:
resource "azurerm_kubernetes_cluster" "main" {
name = "myAKSCluster"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
dns_prefix = "myakscluster"
default_node_pool {
name = "default"
node_count = 3
vm_size = "Standard_D2_v2"
}
network_profile {
network_plugin = "azure"
# No network_policy specified
}
identity {
type = "SystemAssigned"
}
}
Step 2: Install Cilium using Helm
# Get cluster credentials
az aks get-credentials --resource-group myResourceGroup --name myAKSCluster
Explanation: Downloads the cluster configuration and credentials to your local kubeconfig file.
# Add Cilium Helm repository
helm repo add cilium https://helm.cilium.io/
helm repo update
Explanation: Adds the Cilium Helm chart repository and updates the local cache of chart information.
# Install Cilium
helm install cilium cilium/cilium \
--version 1.14.5 \
--namespace kube-system \
--set azure.enabled=true \
--set azure.resourceGroup=myResourceGroup \
--set azure.subscriptionID="YOUR_SUBSCRIPTION_ID" \
--set azure.tenantID="YOUR_TENANT_ID" \
--set tunnel=disabled \
--set ipam.mode=azure \
--set enableIPv4Masquerade=false \
--set nodeinit.enabled=true
Explanation: Installs Cilium with Azure-specific configurations:
-
azure.enabled=true: Enables Azure integration -
tunnel=disabled: Uses Azure routing instead of overlay networking -
ipam.mode=azure: Uses Azure CNI for IP address management -
enableIPv4Masquerade=false: Disables masquerading since Azure handles routing -
nodeinit.enabled=true: Initializes nodes with required configurations
Terraform equivalent (using Helm provider):
resource "helm_release" "cilium" {
name = "cilium"
repository = "https://helm.cilium.io/"
chart = "cilium"
version = "1.14.5"
namespace = "kube-system"
set {
name = "azure.enabled"
value = "true"
}
set {
name = "azure.resourceGroup"
value = azurerm_resource_group.main.name
}
set {
name = "azure.subscriptionID"
value = data.azurerm_subscription.current.subscription_id
}
set {
name = "azure.tenantID"
value = data.azurerm_subscription.current.tenant_id
}
set {
name = "tunnel"
value = "disabled"
}
set {
name = "ipam.mode"
value = "azure"
}
set {
name = "enableIPv4Masquerade"
value = "false"
}
set {
name = "nodeinit.enabled"
value = "true"
}
depends_on = [azurerm_kubernetes_cluster.main]
}
data "azurerm_subscription" "current" {}
Step 3: Verify installation
# Check Cilium pods
kubectl get pods -n kube-system -l k8s-app=cilium
# Check Cilium status
cilium status --wait
# Run connectivity test
cilium connectivity test
Explanation: These commands verify that Cilium is installed correctly and running on all nodes. The connectivity test performs comprehensive checks of network functionality.
Example Cilium Policies
Standard Kubernetes Network Policy:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-backend
namespace: production
spec:
podSelector:
matchLabels:
tier: backend
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
tier: frontend
ports:
- protocol: TCP
port: 8080
Cilium Layer 7 HTTP Policy:
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: allow-specific-http-methods
namespace: production
spec:
endpointSelector:
matchLabels:
app: api-server
ingress:
- fromEndpoints:
- matchLabels:
app: frontend
toPorts:
- ports:
- port: "8080"
protocol: TCP
rules:
http:
- method: "GET"
path: "/api/.*"
- method: "POST"
path: "/api/users"
DNS/FQDN-based Policy:
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: allow-external-apis
namespace: production
spec:
endpointSelector:
matchLabels:
app: backend
egress:
- toFQDNs:
- matchPattern: "*.azure.com"
- matchName: "api.github.com"
toPorts:
- ports:
- port: "443"
protocol: TCP
Layer 7 gRPC Policy:
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: grpc-policy
namespace: production
spec:
endpointSelector:
matchLabels:
app: grpc-server
ingress:
- fromEndpoints:
- matchLabels:
app: grpc-client
toPorts:
- ports:
- port: "50051"
protocol: TCP
rules:
http:
- method: "POST"
path: "/order.OrderService/GetOrder"
- method: "POST"
path: "/order.OrderService/CreateOrder"
- method: "POST"
path: "/order.OrderService/ListOrders"
Service-based Policy:
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: allow-to-service
namespace: production
spec:
endpointSelector:
matchLabels:
app: client
egress:
- toServices:
- k8sService:
serviceName: backend-service
namespace: production
Cilium Observability with Hubble
Hubble is Cilium's observability platform:
# Enable Hubble
helm upgrade cilium cilium/cilium \
--namespace kube-system \
--reuse-values \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true
Explanation: Enables Hubble UI and relay for network observability. This provides a graphical interface to visualize network traffic and policies.
# Install Hubble CLI
export HUBBLE_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/hubble/master/stable.txt)
curl -L --remote-name-all https://github.com/cilium/hubble/releases/download/$HUBBLE_VERSION/hubble-linux-amd64.tar.gz{,.sha256sum}
sha256sum --check hubble-linux-amd64.tar.gz.sha256sum
sudo tar xzvfC hubble-linux-amd64.tar.gz /usr/local/bin
Explanation: Downloads and installs the Hubble CLI tool for querying network flows from the command line.
# Port forward to Hubble UI
kubectl port-forward -n kube-system svc/hubble-ui 12000:80
Explanation: Makes the Hubble UI accessible at http://localhost:12000 for visualizing network traffic.
Query network flows:
# Observe flows in real-time
hubble observe
# Observe flows for specific namespace
hubble observe --namespace production
# Observe flows with specific labels
hubble observe --from-label app=frontend --to-label app=backend
# Observe denied flows
hubble observe --verdict DROPPED
Comparison Matrix
Here's a comprehensive comparison of the three network policy options:
| Feature | Azure NPM | Calico | Cilium |
|---|---|---|---|
| Native AKS Support | ✅ Yes | ✅ Yes | ❌ No (manual install) |
| Ease of Setup | ⭐⭐⭐⭐⭐ Very Easy | ⭐⭐⭐⭐ Easy | ⭐⭐⭐ Moderate |
| Standard K8s Policies | ✅ Yes | ✅ Yes | ✅ Yes |
| Layer 7 Policies | ❌ No | ⚠️ Limited | ✅ Full support |
| DNS/FQDN Policies | ❌ No | ✅ Yes | ✅ Yes |
| Global Policies | ❌ No | ✅ Yes | ✅ Yes |
| Performance | ⭐⭐⭐ Good | ⭐⭐⭐⭐ Very Good | ⭐⭐⭐⭐⭐ Excellent |
| Observability | ⭐⭐ Basic | ⭐⭐⭐⭐ Advanced | ⭐⭐⭐⭐⭐ Superior |
| Learning Curve | ⭐⭐⭐⭐⭐ Easy | ⭐⭐⭐ Moderate | ⭐⭐ Steep |
| Enterprise Support | ✅ Microsoft | ✅ Tigera | ✅ Isovalent |
| Encryption | ❌ No | ✅ WireGuard | ✅ IPSec/WireGuard |
| Multi-cluster | ❌ Limited | ✅ Yes | ✅ Yes |
| Service Mesh Features | ❌ No | ⚠️ Limited | ✅ Yes |
| Resource Usage | ⭐⭐⭐⭐ Low | ⭐⭐⭐ Moderate | ⭐⭐⭐ Moderate |
| Maturity in AKS | ⭐⭐⭐⭐⭐ High | ⭐⭐⭐⭐⭐ High | ⭐⭐⭐ Growing |
| Community | Azure community | ⭐⭐⭐⭐⭐ Large | ⭐⭐⭐⭐ Growing |
| Cost | Free | Free (OSS) | Free (OSS) |
Implementation Best Practices
1. Start with Deny-All Policy
Regardless of which solution you choose, start with a default deny-all policy:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
This creates a secure-by-default posture where you explicitly allow only necessary traffic.
2. Implement Least Privilege Access
Only allow the minimum necessary access:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: backend-policy
namespace: production
spec:
podSelector:
matchLabels:
app: backend
policyTypes:
- Ingress
- Egress
ingress:
# Only allow from frontend
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
egress:
# Only allow to database
- to:
- podSelector:
matchLabels:
app: database
ports:
- protocol: TCP
port: 5432
# Allow DNS
- to:
- namespaceSelector:
matchLabels:
name: kube-system
ports:
- protocol: UDP
port: 53
3. Use Namespaces for Isolation
Organize workloads into namespaces and use namespace selectors:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-same-namespace
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
ingress:
- from:
- podSelector: {}
4. Always Allow Essential Traffic
Don't forget to allow essential cluster traffic:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-dns
namespace: production
spec:
podSelector: {}
policyTypes:
- Egress
egress:
# Allow DNS
- to:
- namespaceSelector:
matchLabels:
name: kube-system
- podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
5. Label Resources Consistently
Use consistent labeling for effective policy management:
# Good labeling strategy
metadata:
labels:
app: backend
tier: api
environment: production
team: platform
6. Test Policies in Non-Production First
Always test network policies in development/staging before production:
# Create test namespace
kubectl create namespace policy-test
# Deploy test application
kubectl apply -f test-app.yaml -n policy-test
# Apply policy
kubectl apply -f test-policy.yaml -n policy-test
# Test connectivity
kubectl exec -n policy-test test-pod -- curl http://backend-service
7. Monitor and Audit Policies
Implement monitoring for policy violations:
# For Calico - view denied flows
calicoctl get felixconfiguration default -o yaml
# For Cilium - observe dropped packets
hubble observe --verdict DROPPED
# Check policy logs in Azure Monitor
az monitor log-analytics query \
--workspace YOUR_WORKSPACE_ID \
--analytics-query "KubePodInventory | where Namespace == 'production'"
8. Document Your Policies
Maintain documentation for your network policies:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: frontend-policy
namespace: production
annotations:
description: "Allows frontend pods to communicate with backend API"
owner: "platform-team@company.com"
reviewed: "2024-01-15"
spec:
# ... policy spec
Testing Network Policies
Basic Connectivity Testing
Create test pods to verify policy enforcement:
# test-client.yaml
apiVersion: v1
kind: Pod
metadata:
name: test-client
namespace: production
labels:
app: test
spec:
containers:
- name: netshoot
image: nicolaka/netshoot
command: ["sleep", "3600"]
---
# test-server.yaml
apiVersion: v1
kind: Pod
metadata:
name: test-server
namespace: production
labels:
app: backend
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
Test connectivity:
# Deploy test pods
kubectl apply -f test-client.yaml
kubectl apply -f test-server.yaml
# Get server IP
SERVER_IP=$(kubectl get pod test-server -n production -o jsonpath='{.status.podIP}')
# Test before policy (should work)
kubectl exec -n production test-client -- curl -m 5 http://$SERVER_IP
# Apply restrictive policy
kubectl apply -f restrictive-policy.yaml
# Test after policy (should fail if policy blocks it)
kubectl exec -n production test-client -- curl -m 5 http://$SERVER_IP
Policy Validation Tools
Use policy validation tools:
# For Calico - policy validation
calicoctl get networkpolicy --all-namespaces
calicoctl get globalnetworkpolicy
# For Cilium - policy validation
cilium policy validate
cilium endpoint list
# Check policy effectiveness
kubectl get networkpolicy --all-namespaces
kubectl describe networkpolicy <policy-name> -n <namespace>
Troubleshooting Network Policies
Common Issues and Solutions
Issue 1: Policy Not Being Enforced
# Check if network policy is applied
kubectl get networkpolicy -n <namespace>
# Describe the policy
kubectl describe networkpolicy <policy-name> -n <namespace>
# For Calico - check Felix logs
kubectl logs -n kube-system -l k8s-app=calico-node
# For Cilium - check agent logs
kubectl logs -n kube-system -l k8s-app=cilium
Issue 2: Pods Cannot Connect
# Verify pod labels
kubectl get pods --show-labels -n <namespace>
# Check if pod matches policy selector
kubectl get networkpolicy <policy-name> -n <namespace> -o yaml
# Test DNS resolution
kubectl exec <pod-name> -n <namespace> -- nslookup kubernetes.default
# Check if DNS is allowed in policy
kubectl get networkpolicy -n <namespace> -o yaml | grep -A 10 egress
Issue 3: Performance Degradation
# For Calico - check Felix performance metrics
kubectl get felixconfiguration default -o yaml
# For Cilium - check eBPF program performance
cilium metrics list
# Check node resource usage
kubectl top nodes
kubectl top pods -n kube-system
Debugging Tools
Network debugging pod:
apiVersion: v1
kind: Pod
metadata:
name: netdebug
spec:
containers:
- name: netshoot
image: nicolaka/netshoot
command: ["sleep", "3600"]
# Deploy and use debugging pod
kubectl apply -f netdebug.yaml
# Available tools in netshoot:
kubectl exec netdebug -- ping <ip>
kubectl exec netdebug -- traceroute <ip>
kubectl exec netdebug -- nslookup <hostname>
kubectl exec netdebug -- curl -v <url>
kubectl exec netdebug -- tcpdump -i any port 80
Migration Strategies
Migrating from No Policy to Network Policies
Step 1: Audit existing traffic
# Enable flow logs (if using Calico or Cilium)
# For Calico:
kubectl patch felixconfiguration default --type merge -p '{"spec":{"flowLogsEnableHostEndpoint":true}}'
# For Cilium with Hubble:
hubble observe --all
Step 2: Create allow-all policy
# Baseline policy that allows everything (no disruption)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-all-temporary
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
- {}
egress:
- {}
Step 3: Gradually restrict
# Phase 1: Deny all, but allow same namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-same-namespace
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
ingress:
- from:
- podSelector: {}
---
# Phase 2: Add specific policies for each service
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: backend-policy
namespace: production
spec:
podSelector:
matchLabels:
app: backend
# ... specific rules
Migrating Between Policy Engines
From Azure NPM to Calico:
- Create a new cluster with Calico
- Migrate workloads using blue-green deployment
- Validate policies work correctly
- Switch traffic to new cluster
- Decommission old cluster
From Calico to Cilium:
- Back up existing Calico policies
- Install Cilium alongside Calico (if possible for testing)
- Convert policies to Cilium format
- Test thoroughly in non-production environment
- Perform controlled migration
Real-World Use Cases
Use Case 1: Multi-Tier Web Application
Architecture:
- Frontend (React app)
- Backend API (Node.js)
- Database (PostgreSQL)
- Redis cache
Network Policy Strategy:
# Frontend can access backend API only
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: frontend-policy
namespace: production
spec:
podSelector:
matchLabels:
tier: frontend
policyTypes:
- Egress
egress:
- to:
- podSelector:
matchLabels:
tier: backend
ports:
- protocol: TCP
port: 3000
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
---
# Backend can access database and Redis only
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: backend-policy
namespace: production
spec:
podSelector:
matchLabels:
tier: backend
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
tier: frontend
ports:
- protocol: TCP
port: 3000
egress:
- to:
- podSelector:
matchLabels:
tier: database
ports:
- protocol: TCP
port: 5432
- to:
- podSelector:
matchLabels:
tier: cache
ports:
- protocol: TCP
port: 6379
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
---
# Database accepts connections from backend only
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: database-policy
namespace: production
spec:
podSelector:
matchLabels:
tier: database
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
tier: backend
ports:
- protocol: TCP
port: 5432
Use Case 2: Microservices with Service Mesh
For microservices architectures with Cilium:
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: order-service-policy
namespace: production
spec:
endpointSelector:
matchLabels:
app: order-service
ingress:
- fromEndpoints:
- matchLabels:
app: api-gateway
toPorts:
- ports:
- port: "8080"
protocol: TCP
rules:
http:
- method: "GET"
path: "/api/orders/.*"
- method: "POST"
path: "/api/orders"
- method: "PUT"
path: "/api/orders/.*"
egress:
- toEndpoints:
- matchLabels:
app: inventory-service
toPorts:
- ports:
- port: "8080"
protocol: TCP
rules:
http:
- method: "GET"
path: "/api/inventory/.*"
- toFQDNs:
- matchPattern: "*.database.azure.com"
toPorts:
- ports:
- port: "5432"
protocol: TCP
Use Case 3: Compliance and Regulatory Requirements
For environments requiring strict compliance:
# PCI-DSS compliant policy for payment service
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: payment-service-policy
namespace: pci-compliant
annotations:
compliance: "PCI-DSS v3.2.1"
description: "Isolates payment processing workloads"
spec:
podSelector:
matchLabels:
compliance: pci-dss
policyTypes:
- Ingress
- Egress
ingress:
# Only allow from authenticated API gateway
- from:
- podSelector:
matchLabels:
app: api-gateway
auth: enabled
ports:
- protocol: TCP
port: 8443
egress:
# Only allow to payment provider APIs
- to:
- podSelector:
matchLabels:
type: payment-provider
ports:
- protocol: TCP
port: 443
# Allow logging to compliance log collector
- to:
- podSelector:
matchLabels:
app: log-collector
compliance: enabled
ports:
- protocol: TCP
port: 9200
Conclusion
Choosing the right network policy solution for your AKS cluster is a critical decision that impacts security, performance, and operational complexity. Let's summarize the key considerations:
Quick Decision Guide
Choose Azure Network Policy Manager if:
- You're new to Kubernetes network policies
- You want simple, Microsoft-supported solution
- You need only basic network policy features
- You prefer minimal operational overhead
- Your team is already familiar with Azure networking
Choose Calico if:
- You need advanced network policy features
- You require DNS/FQDN-based policies
- You need detailed observability and logging
- You want a mature, proven solution
- You have compliance requirements needing extensive auditing
- You may need to migrate to other cloud providers or on-premises
Choose Cilium if:
- Performance is a critical requirement
- You need Layer 7 HTTP/gRPC policy enforcement
- You want modern, API-aware security
- You need advanced observability with Hubble
- You're building a microservices architecture
- You have expertise to manage a more complex system
- You want cutting-edge technology and features
Key Takeaways
Start Simple: Begin with basic policies and gradually increase complexity as you understand your traffic patterns
Test Thoroughly: Always test network policies in non-production environments before applying to production
Monitor Continuously: Implement comprehensive monitoring to detect policy violations and unexpected behavior
Document Everything: Maintain clear documentation of your network policies and their purposes
Plan for Scale: Consider how your policy solution will scale as your cluster grows
Security First: Implement a deny-by-default strategy and explicitly allow only necessary traffic
Regular Audits: Periodically review and update network policies to ensure they remain relevant and effective
Future Trends
The Kubernetes network policy ecosystem continues to evolve:
- eBPF adoption: More solutions leveraging eBPF for better performance
- Service mesh integration: Tighter integration between network policies and service meshes
- Zero trust networking: Enhanced support for zero-trust security models
- Multi-cluster policies: Better support for policies spanning multiple clusters
- AI/ML-based policies: Intelligent policies that adapt based on traffic patterns
Regardless of which solution you choose, implementing network policies is essential for securing your AKS workloads. By following the best practices outlined in this article and choosing the solution that best fits your requirements, you can build a secure, compliant, and performant Kubernetes environment.
Remember: network policies are just one layer of security. Combine them with other security measures like RBAC, Pod Security Standards, image scanning, and secret management for defense-in-depth security.
References
- Kubernetes Network Policies
- Azure Kubernetes Service Network Policies
- Azure Network Policy Manager Documentation
- Calico Documentation
- Calico on AKS
- Project Calico Network Policy Reference
- Cilium Documentation
- Cilium on Azure
- Cilium Network Policy
- Hubble Observability Platform
- eBPF Technology Overview
- Azure CNI Networking
- Kubernetes Network Policy Recipes
- Network Policy Editor
- Zero Trust Networks in Kubernetes
- AKS Security Best Practices
- Comparing Network Policy Implementations
Top comments (0)