Enforcing Zero-Trust Egress in Kubernetes with NetworkPolicies

#security #devops #tutorial #kubernetes

Most teams invest heavily in locking down inbound traffic — ingress rules, service meshes, mutual TLS — while leaving outbound traffic largely uncontrolled. That oversight creates a significant attack surface: a compromised container can silently reach out to an adversary-controlled server, exfiltrate sensitive data, or retrieve a second-stage payload without triggering a single alert, because nothing was monitoring traffic in the outbound direction.

Zero-trust networking applies the principle of least privilege in both directions. The default answer to "can this pod initiate this connection?" is no — for both ingress and egress. This guide walks through implementing that model for egress using native Kubernetes NetworkPolicy objects: deny all outbound traffic by default, then explicitly allow only what each workload legitimately requires. No service mesh, no additional tooling — just declarative YAML you can apply to any compliant cluster today.

Prerequisite: CNI Enforcement

Before applying any NetworkPolicy manifest, verify that your CNI plugin actually enforces policy. This is the single most common source of confusion when getting started.

NetworkPolicy is a Kubernetes API abstraction, not an implementation. The API server will accept any well-formed policy object, but the policy has no effect unless the underlying CNI plugin is configured to enforce it. The default CNI on a standard kind cluster or many stock configurations does not enforce NetworkPolicy.

Use a policy-enforcing CNI — Calico and Cilium are the most widely deployed options. For a disposable test cluster:

minikube start --cni=calico

Confirm the CNI is operational before proceeding:

kubectl get pods -n kube-system | grep calico

If you apply the policies in this guide and observe no change in connectivity, a non-enforcing CNI is almost always the root cause.

Step 1: Create a Namespace and Test Workload

kubectl create namespace app
kubectl -n app run web --image=nginx --labels="app=web"

Use netshoot as an ephemeral debug pod to validate connectivity from within the namespace:

kubectl -n app run netshoot --rm -it --image=nicolaka/netshoot -- /bin/bash

From inside that shell, confirm the cluster is currently operating with no egress restrictions:

curl -m 5 https://example.com   # succeeds
nslookup kubernetes.default     # succeeds

At this point, any pod can reach any destination. The following steps will close that off systematically.

Step 2: Default-Deny All Egress

Apply a NetworkPolicy that selects all pods in the namespace (via an empty podSelector) and specifies Egress in policyTypes with no allow rules. This results in a deny-all for outbound traffic:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-egress
  namespace: app
spec:
  podSelector: {}
  policyTypes:
    - Egress

Apply the manifest and re-run the test pod:

kubectl apply -f default-deny-egress.yaml
kubectl -n app run netshoot --rm -it --image=nicolaka/netshoot -- /bin/bash

curl -m 5 https://example.com   # times out
nslookup kubernetes.default     # fails

Note that DNS resolution has also broken. This is expected and is addressed in the next step.

Step 3: Restore DNS Resolution

The moment you enforce a default-deny egress policy, pods lose the ability to reach kube-dns, which causes all hostname resolution to fail — including for destinations you intend to allow. You must explicitly permit egress to the cluster DNS service.

kube-dns pods are identifiable by the label k8s-app: kube-dns. The following policy opens egress from all pods in the namespace to that target on UDP and TCP port 53:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns
  namespace: app
spec:
  podSelector: {}
  policyTypes:
    - Egress
  egress:
    - to:
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53

NetworkPolicy rules are additive — this policy adds a permitted path on top of the existing default-deny. After applying it, DNS resolution is restored, but arbitrary outbound connections remain blocked. That is the intended state: name resolution functions, but no traffic flows unless explicitly allowed.

Step 4: Grant Per-Workload Egress Permissions

With the baseline in place, you can now issue narrow, workload-specific allow rules. Suppose a checkout service requires outbound connectivity to an external payments API over HTTPS, and nothing else. Scope the rule to that workload's label selector and the relevant destination CIDR:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-egress-payments
  namespace: app
spec:
  podSelector:
    matchLabels:
      app: checkout
  policyTypes:
    - Egress
  egress:
    - to:
        - ipBlock:
            cidr: 203.0.113.0/24
      ports:
        - protocol: TCP
          port: 443

With this policy in place, only pods labeled app: checkout can initiate outbound connections, and only to that CIDR on port 443. All other pods and all other destinations remain denied. You have moved from an implicit open-by-default posture to an explicit allow-list — the foundational principle of zero-trust egress.

Production Considerations

Several operational realities become apparent once this pattern moves beyond a lab environment:

Hostname-based matching is not supported in vanilla Kubernetes. NetworkPolicy operates exclusively on IPs and CIDRs, not FQDNs. If a dependency resolves to a rotating IP pool — as most SaaS APIs do — an ipBlock rule becomes fragile and operationally expensive. Cilium's FQDN-based policy (a CRD, not a core NetworkPolicy) addresses this directly: specify toFQDNs: api.stripe.com and Cilium tracks the resolved IPs automatically.
Pods not selected by any policy are unrestricted. Default-deny applies only to pods that a policy's podSelector actually matches. Regularly audit for workloads that have no applicable policy and would therefore bypass all egress controls.
Policies are namespace-scoped. A default-deny-egress policy in the app namespace has no effect on pods in payments or any other namespace. Apply the baseline deny policy to every namespace — ideally via a templated manifest managed in your GitOps repository, so no new namespace can be provisioned without it.
Denied traffic is not logged by default. Native NetworkPolicy silently drops blocked connections without emitting any log or event. Debugging failed connectivity relies on inference from timeouts, which is slow and error-prone in production. Calico and Cilium both provide flow-level visibility — enable it before rolling this pattern to any environment where you need operational observability.
Apply allow rules before deny rules. In production environments, apply all workload-specific allow rules first and validate that legitimate traffic continues to flow, then apply the default-deny policy last. Reversing that order will cause an immediate outage while you reconstruct your dependency graph under pressure.

Summary

Egress control is the half of zero-trust networking that is easiest to defer and most costly to neglect. With three focused manifests — a namespace-wide default-deny, a DNS allow rule, and per-workload egress permissions — you transform outbound traffic from an unmonitored open channel into an auditable, explicit allow-list using nothing beyond standard Kubernetes primitives and a CNI that enforces them.

The recommended rollout path: start in a non-production namespace, enable flow logging from day one, validate all required paths, then promote the pattern namespace by namespace with your GitOps tooling driving consistency.

The author is a Platform and DevSecOps engineer (CKA, CISSP) who publishes production-grounded guides on Kubernetes security, CI/CD pipelines, and cloud compliance. If your organization is looking for technical content that practitioners trust, feel free to reach out.