DEV Community

devtocash
devtocash

Posted on • Originally published at devtocash.com

Kubernetes Security Best Practices 2026: Complete Hardening Guide

Kubernetes Security Best Practices 2026: The Complete Hardening Guide

Introduction

A single misconfigured Kubernetes cluster can expose your entire infrastructure in minutes. In 2025 alone, over 60% of organizations reported at least one Kubernetes security incident — and the majority traced back to preventable misconfigurations, not zero-day exploits.

Kubernetes ships with minimal security defaults. Everything is open. Pods can talk to each other freely. Service accounts carry cluster-admin privileges by default. Secrets sit in etcd unencrypted. If you deploy a vanilla cluster and walk away, you are effectively running with the doors unlocked.

This guide walks through 4 critical security layers you must implement in any production Kubernetes cluster. Each section includes real, copy-paste-ready YAML manifests and CLI commands. Whether you run EKS, GKE, AKS, or bare-metal kubeadm clusters, these practices apply universally.

Who this is for: DevOps engineers, SREs, platform engineers, and anyone responsible for production Kubernetes clusters. You should be comfortable with kubectl and basic YAML. No prior security specialization required.

What you will implement by the end of this guide:

  • Fine-grained RBAC with least-privilege principles
  • Pod Security Admission replacing deprecated PSPs
  • Zero-trust network policies with default-deny
  • Image vulnerability scanning with Trivy in CI/CD

Let's lock it down.

1. RBAC: Least Privilege from Day One

Role-Based Access Control (RBAC) is your first and most important line of defense. The principle is simple: every user, service account, and application should have exactly the permissions it needs — nothing more.

The Problem with Default RBAC

Out of the box, Kubernetes grants the system:masters group (used by kubeadm init) cluster-admin. Every default service account in the kube-system namespace gets elevated privileges. The default service account in every namespace exists automatically and — unless you explicitly bind it — has no permissions. But many teams accidentally grant it broad access during development and forget to revoke it.

Here is the most common anti-pattern we see in production audits:

# BAD: cluster-admin bound to default service account
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: dangerous-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: default
  namespace: default
Enter fullscreen mode Exit fullscreen mode

This gives every pod in the default namespace unrestricted control over the entire cluster. If any pod gets compromised, the attacker owns everything.

RBAC Best Practices

1. Use namespace-scoped Roles, not ClusterRoles, whenever possible.

A Role is bound to a single namespace. A ClusterRole is cluster-wide. Most applications only need access to resources in their own namespace.

# GOOD: namespace-scoped Role for a web application
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: webapp-role
  namespace: production
rules:
- apiGroups: [""]
  resources: ["pods", "services", "configmaps", "secrets"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["get", "list", "watch", "update"]
Enter fullscreen mode Exit fullscreen mode

2. Create a dedicated ServiceAccount for every application.

Never use the default service account. Always create a named ServiceAccount and bind it to a specific Role.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: webapp-sa
  namespace: production
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: webapp-binding
  namespace: production
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: webapp-role
subjects:
- kind: ServiceAccount
  name: webapp-sa
  namespace: production
Enter fullscreen mode Exit fullscreen mode

Then reference it in your Deployment:

spec:
  serviceAccountName: webapp-sa
  automountServiceAccountToken: true
Enter fullscreen mode Exit fullscreen mode

3. Avoid wildcard verbs and resources.

Every * in your RBAC rules is a potential escalation path. Be specific:

# BAD
verbs: ["*"]
resources: ["*"]

# GOOD: explicit
verbs: ["get", "list", "watch"]
resources: ["pods", "services"]
Enter fullscreen mode Exit fullscreen mode

4. Use kubectl auth can-i to verify permissions.

Before deploying, test what a service account can actually do:

kubectl auth can-i create deployments \
  --as=system:serviceaccount:production:webapp-sa \
  --namespace=production
Enter fullscreen mode Exit fullscreen mode

5. Separate human users from machine accounts.

Use OIDC (OpenID Connect) for human authentication. Service accounts are for pods and CI/CD pipelines. Map OIDC groups to Kubernetes roles — never give individual users cluster-admin. Tools like Dex, Keycloak, or cloud-provider IAM (aws-iam-authenticator for EKS, GCP IAM for GKE) handle this cleanly.

6. Audit your RBAC regularly.

Run this one-liner to find overly permissive bindings:

kubectl get clusterrolebindings -o json | \
  jq '.items[] | select(.roleRef.name=="cluster-admin") | .subjects'
Enter fullscreen mode Exit fullscreen mode

You will be surprised how many cluster-admin bindings accumulate over time. RBAC auditing tools like kubescape, kube-bench, or popeye can automate this check.

RBAC for CI/CD Pipelines

CI/CD systems (GitHub Actions, GitLab CI, ArgoCD) need API access to deploy. The pattern: create a ServiceAccount with the minimum permissions required for deployments, extract its token, and inject it into your pipeline secrets.

kubectl create serviceaccount cicd-deployer -n production
kubectl create rolebinding cicd-deployer-binding \
  --role=webapp-role \
  --serviceaccount=production:cicd-deployer \
  -n production

# For Kubernetes 1.24+, create a long-lived token:
kubectl create token cicd-deployer -n production --duration=8760h
Enter fullscreen mode Exit fullscreen mode

Store that token in your CI secrets manager — never in source code.

2. Pod Security Standards & Pod Security Admission

Pod Security Policies (PSPs) were deprecated in Kubernetes 1.21 and removed in 1.25. The replacement is Pod Security Admission (PSA) — a built-in admission controller that enforces Pod Security Standards at the namespace level.

The Three Pod Security Standards

Standard Description Key Restrictions
Privileged Unrestricted. Equivalent to no policy. None. Use only for system namespaces.
Baseline Prevents known privilege escalations. Minimum for production. No hostNetwork, hostPID, hostIPC, hostPorts, privileged containers, or hostPath volumes.
Restricted Hardened following Pod hardening best practices. Everything Baseline restricts, plus: must run as non-root, seccomp profile required, capabilities dropped to NET_BIND_SERVICE only, read-only root filesystem.

Enforcing PSA with Namespace Labels

PSA uses namespace labels — no separate CRD needed. Apply a label to any namespace:

# Enforce the restricted policy on a namespace
kubectl label namespace production \
  pod-security.kubernetes.io/enforce=restricted

# Also set audit and warn modes for visibility
kubectl label namespace production \
  pod-security.kubernetes.io/audit=restricted \
  pod-security.kubernetes.io/warn=restricted
Enter fullscreen mode Exit fullscreen mode

Three enforcement modes exist for each label:

  • enforce: reject pods that violate the policy
  • audit: allow but log violations to the audit log
  • warn: allow but show a warning to the user

Gradual rollout strategy: Start with warn and audit modes for a week. Fix all warnings. Then switch to enforce. Never jump straight to enforce=restricted on existing production namespaces — you will break running workloads.

Writing a Restricted-Compliant Pod

Here is a pod that passes the restricted Pod Security Standard:

apiVersion: v1
kind: Pod
metadata:
  name: secure-nginx
  namespace: production
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    runAsGroup: 3000
    fsGroup: 2000
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: nginx
    image: nginx:1.25-alpine
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop: ["ALL"]
        add: ["NET_BIND_SERVICE"]
      readOnlyRootFilesystem: true
    volumeMounts:
    - name: tmp
      mountPath: /tmp
    - name: nginx-cache
      mountPath: /var/cache/nginx
  volumes:
  - name: tmp
    emptyDir: {}
  - name: nginx-cache
    emptyDir: {}
Enter fullscreen mode Exit fullscreen mode

Key points:

  • runAsNonRoot: true — the container must not run as UID 0
  • seccompProfile: RuntimeDefault — blocks dangerous syscalls by default
  • capabilities.drop: ["ALL"] — strip all Linux capabilities
  • readOnlyRootFilesystem: true — attackers cannot write to the filesystem
  • allowPrivilegeEscalation: false — no setuid binaries

Exemptions (When You Need Them)

Some system workloads genuinely need privileged access — CNI plugins, CSI drivers, monitoring agents. Use namespace exemptions:

# In kube-apiserver.yaml
apiVersion: v1
kind: Pod
spec:
  containers:
  - command:
    - kube-apiserver
    - --admission-control-config-file=/etc/kubernetes/admission.yaml
Enter fullscreen mode Exit fullscreen mode

And in the admission configuration:

apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
- name: PodSecurity
  configuration:
    apiVersion: pod-security.admission.config.k8s.io/v1
    kind: PodSecurityConfiguration
    defaults:
      enforce: "restricted"
    exemptions:
      namespaces: ["kube-system", "cert-manager", "ingress-nginx"]
      usernames: ["system:serviceaccount:kube-system:calico-node"]
Enter fullscreen mode Exit fullscreen mode

3. Network Policies: Zero-Trust Inside the Cluster

Kubernetes networking is flat by default. Every pod can reach every other pod in the cluster, across all namespaces — with zero built-in filtering. If one pod gets compromised, it can scan your entire internal network, hit internal APIs, and pivot to databases.

Network Policies are your internal firewall. They are Kubernetes-native resources that control traffic flow at Layers 3 and 4 (IP and port level). Think of them as security group rules for pods.

Default-Deny: Lock Everything First

Start by denying all ingress and egress traffic. Then selectively open only what each application needs:

# Default deny all ingress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: production
spec:
  podSelector: {}          # selects all pods
  policyTypes:
  - Ingress
Enter fullscreen mode Exit fullscreen mode
# Default deny all egress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-egress
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Egress
Enter fullscreen mode Exit fullscreen mode

With these two policies in place, no pod can receive or initiate traffic. Then layer on allow rules for specific flows:

# Allow frontend → backend on port 8080
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-backend
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080
Enter fullscreen mode Exit fullscreen mode

Real-World Zero-Trust Policy Patterns

Pattern 1: Allow only from the ingress controller

ingress:
- from:
  - namespaceSelector:
      matchLabels:
        name: ingress-nginx
  - podSelector:
      matchLabels:
        app.kubernetes.io/name: ingress-nginx
Enter fullscreen mode Exit fullscreen mode

Pattern 2: Allow DNS egress only (block everything else)

egress:
- to:
  - namespaceSelector:
      matchLabels:
        kubernetes.io/metadata.name: kube-system
  - podSelector:
      matchLabels:
        k8s-app: kube-dns
  ports:
  - protocol: UDP
    port: 53
Enter fullscreen mode Exit fullscreen mode

Pattern 3: Allow egress only to specific external IPs

egress:
- to:
  - ipBlock:
      cidr: 10.0.0.0/8    # internal VPC only
      except:
      - 10.0.0.0/28        # except management subnet
Enter fullscreen mode Exit fullscreen mode

Cilium: Beyond Basic Network Policies

If your CNI is Cilium (increasingly common in 2026 for eBPF-based networking), you get Layer 7 policies — filtering by HTTP method, path, or DNS name:

apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: l7-policy
spec:
  endpointSelector:
    matchLabels:
      app: api
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: frontend
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
      rules:
        http:
        - method: "GET"
          path: "/api/v1/.*"
Enter fullscreen mode Exit fullscreen mode

Layer 7 policies let you declare: only GET requests to /api/v1/ endpoints are allowed — no POST, no DELETE, no /admin. A compromised frontend pod cannot abuse backend endpoints it should not access.

Testing Network Policies

Use netshoot (a network debugging container) to verify connectivity:

kubectl run netshoot --rm -it --image nicolaka/netshoot -- /bin/bash

# From inside, test connectivity:
curl backend-service.production.svc.cluster.local:8080

# Test DNS:
nslookup kubernetes.default
Enter fullscreen mode Exit fullscreen mode

Always test both positive (traffic that should flow) and negative (traffic that should be blocked) cases. Network policies are easy to misconfigure — a single missing label selector can leave a hole wide open.

4. Image Security: Scan Every Container with Trivy

Running containers from untrusted or unverified images is the most common entry point for supply-chain attacks. In 2024, a compromised xz-utils backdoor nearly made it into production containers worldwide. In 2025, multiple malicious NPM and PyPI packages were found embedded in popular Docker images.

The rule: every image that enters your cluster must be scanned. Trivy, by Aqua Security, is the de-facto open-source scanner — fast, comprehensive, and CI/CD-friendly. It scans OS packages, language dependencies, and misconfigurations in a single pass.

Scanning Images

trivy image nginx:1.25-alpine

# Filter by severity — only flag HIGH and CRITICAL
trivy image --severity HIGH,CRITICAL nginx:1.25-alpine

# Output as JSON for pipeline integration
trivy image --format json --output trivy-report.json myapp:latest

# Scan filesystem for IaC misconfigurations
trivy config ./kubernetes/
Enter fullscreen mode Exit fullscreen mode

Integrating Trivy into CI/CD

The most effective pattern: scan in your pipeline and block deployments for HIGH or CRITICAL findings.

GitHub Actions — Trivy scan step:

- name: Scan container image with Trivy
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: myapp:${{ github.sha }}
    format: sarif
    output: trivy-results.sarif
    severity: HIGH,CRITICAL
    exit-code: 1
Enter fullscreen mode Exit fullscreen mode

GitLab CI — Trivy scan job:

trivy-scan:
  stage: security
  image: aquasec/trivy:latest
  script:
    - trivy image --severity HIGH,CRITICAL --exit-code 1 $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
  allow_failure: false
Enter fullscreen mode Exit fullscreen mode

Trivy Operator: Continuous Cluster Scanning

For runtime scanning of images already deployed in your cluster, install the Trivy Operator:

helm repo add aqua https://aquasecurity.github.io/helm-charts/
helm install trivy-operator aqua/trivy-operator \
  --namespace trivy-system \
  --create-namespace \
  --set trivy.ignoreUnfixed=true
Enter fullscreen mode Exit fullscreen mode

Query vulnerability reports across your namespaces:

kubectl get vulnerabilityreports -n production
kubectl describe vulnerabilityreport replicaset-myapp-7d4f8b9c6d-nginx
Enter fullscreen mode Exit fullscreen mode

Image Pinning and Digest-Based References

Never use floating tags like :latest in production. Pin to a content-addressable digest:

# BAD: floating tag, can change underneath you
image: nginx:latest

# GOOD: digest pinning, immutable reference
image: nginx@sha256:aed492c4d72c4a4e2f4d7d5e1f3b6c8a9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4
Enter fullscreen mode Exit fullscreen mode

Extract the digest of any image:

docker inspect nginx:1.25-alpine | jq -r '.[0].RepoDigests[0]'
Enter fullscreen mode Exit fullscreen mode

Combine digest pinning with automated PRs from Renovate or Dependabot to keep digests updated without manual toil.

Conclusion

Kubernetes security is a layered defense. Start with RBAC — if every pod runs as cluster-admin, nothing else matters. Then enforce Pod Security Standards to prevent containers from escaping their sandbox. Lock down the network with default-deny policies so a compromised pod can't pivot. Finally, scan every image with Trivy to catch vulnerabilities before they reach production.

Each of these four layers independently raises the bar. Together, they turn a wide-open cluster into a hardened environment where an attacker needs to defeat multiple defenses to cause real damage. The YAML in this guide is production-ready — copy, adapt, and apply it today.


Originally published at devtocash.com

Top comments (0)