DEV Community

cloud-sky-ops
cloud-sky-ops

Posted on

Post 1/10 — Multi-Tenancy & Security Baseline with Namespaces, Quotas, NetworkPolicies, and Pod Security Admission

Author: A senior DevOps engineer who’s broken (and fixed) too many clusters so you don’t have to.


Executive Summary

  • Carve tenants with Namespaces to isolate RBAC, policies, and quotas per team.
  • Enforce fair sharing with ResourceQuota + LimitRange so noisy neighbors can’t starve the cluster.
  • Lock down traffic with NetworkPolicy: start with default-deny, then allow the minimum (DNS, app→DB).
  • Harden workloads with Pod Security Admission (PSA) to block privileged/unsafe pod specs at the namespace boundary.
  • Ship a repeatable baseline: two team namespaces, quotas/limits, default-deny + specific allows, PSA restricted.

Prereqs

  • A Kubernetes cluster (≥ v1.28 recommended) and kubectl configured.
  • Cluster has CNI that supports NetworkPolicy (Calico, Cilium, Antrea, etc.).
  • Helm optional (not required here).
  • You have cluster-admin for initial setup.
kubectl version --short
kubectl get nodes -o wide
kubectl get pods -n kube-system
Enter fullscreen mode Exit fullscreen mode

Concepts

1) Namespaces for isolation

Definition: Namespaces slice a cluster into logical tenants with independent RBAC, quotas, and policies. Prevents accidental cross-team impact, scopes access and makes policy application simple (kubectl label ns once).
Best practices:

  • One namespace per team/app-tier, not per environment object (use labels like env=prod).
  • Name predictably: team-a, team-b, platform, etc.
  • Attach PSA labels, default network policies, and quotas at namespace creation.

Commands:

kubectl create ns team-a
kubectl create ns team-b
kubectl label ns team-a env=dev --overwrite
kubectl get ns --show-labels
Enter fullscreen mode Exit fullscreen mode

Before → After:

  • Before: All pods in default, broad access, hard to apply policies.
  • After: team-a/, team-b/ with their own quotas, PSA, and network baselines.

When to use: Always—namespaces are table stakes for multi-tenancy.


2) ResourceQuota & LimitRange

Definition: ResourceQuota caps aggregate usage per namespace; LimitRange sets per-pod/container defaults and maxima. Stops runaway resource grabs and ensures every pod has sensible requests/limits for scheduling and stability.

Best practices:

  • Pair them: RQ for team-level ceilings; LR for sane per-workload defaults.
  • Set CPU/memory requests+limits and object counts (pods, PVCs, services).
  • Include ephemeral-storage where supported; keep room for rollouts (e.g., 20–30% headroom).

Commands:

kubectl apply -n team-a -f quota-team-a.yaml
kubectl get resourcequota -n team-a
kubectl describe limitrange -n team-a
Enter fullscreen mode Exit fullscreen mode

Before → After:

  • Before: Pods without limits; one job consumes all CPU → others starve.
  • After: Each container gets defaults; namespace can’t exceed its slice.

When to use: Whenever multiple teams share nodes or costs matter (i.e., always).


3) NetworkPolicy

Definition: NetworkPolicy declares which pods may talk to which, for ingress and egress. Prevents lateral movement, accidental chats between unrelated services, and data exfiltration.

Best practices:

  • Start with default-deny for both ingress and egress.
  • Then allow only what you need (DNS 53, app→DB 5432, etc.).
  • Use labels consistently; avoid relying on IPs. Add namespaceSelector + podSelector for system services like CoreDNS.

Commands:

kubectl apply -n team-a -f np-default-deny.yaml
kubectl apply -n team-a -f np-allow-dns.yaml
kubectl apply -n team-a -f np-allow-app-to-db.yaml
Enter fullscreen mode Exit fullscreen mode

Before → After:

  • Before: Any pod can connect to any pod/Internet.
  • After: Only DNS + app→db allowed; everything else dropped.

When to use: In any regulated or multi-tenant cluster; also in prod by default.


4) Pod Security Admission (PSA)

Definition: PSA enforces Kubernetes security profiles (privileged, baseline, restricted) via namespace labels. Blocks dangerous specs (privileged, hostPID/IPC, hostPath, capability escalation) before pods land on nodes.

Best practices:

  • Default restricted enforce on app namespaces; use baseline temporarily while migrating.
  • Apply enforce, warn, and audit labels during rollout to see breaks before enforcing.
  • Keep images running as non-root; avoid broad capabilities and host volumes.

Commands:

kubectl label ns team-a \
  pod-security.kubernetes.io/enforce=restricted \
  pod-security.kubernetes.io/enforce-version=latest \
  pod-security.kubernetes.io/warn=restricted \
  pod-security.kubernetes.io/warn-version=latest \
  pod-security.kubernetes.io/audit=restricted \
  pod-security.kubernetes.io/audit-version=latest --overwrite
Enter fullscreen mode Exit fullscreen mode

Before → After:

  • Before: Developers could deploy privileged pods or mount /var/run/docker.sock.
  • After: Such pods are rejected at admission with a clear error.

When to use: Immediately after namespace creation; dev/prod alike (stricter in prod).


Diagram 1 — Namespace/Tenant Layout

Diagram-1


Mini-Lab (≈25 min): Two tenants, quotas/limits, default-deny, app→db allow, PSA restricted

You can paste these as-is; tweak CPU/memory as needed.

Create namespaces + PSA labels

kubectl create ns team-a
kubectl create ns team-b

for ns in team-a team-b; do
  kubectl label ns $ns \
    pod-security.kubernetes.io/enforce=restricted \
    pod-security.kubernetes.io/enforce-version=latest \
    pod-security.kubernetes.io/warn=restricted \
    pod-security.kubernetes.io/warn-version=latest \
    pod-security.kubernetes.io/audit=restricted \
    pod-security.kubernetes.io/audit-version=latest --overwrite
done
Enter fullscreen mode Exit fullscreen mode

Apply ResourceQuota + LimitRange

quota-team-a.yaml

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-a-quota
spec:
  hard:
    pods: "30"
    requests.cpu: "8"
    requests.memory: 16Gi
    limits.cpu: "16"
    limits.memory: 32Gi
    persistentvolumeclaims: "10"
    services: "10"
    services.loadbalancers: "2"
---
apiVersion: v1
kind: LimitRange
metadata:
  name: team-a-defaults
spec:
  limits:
  - type: Container
    defaultRequest:
      cpu: "200m"
      memory: "256Mi"
    default:
      cpu: "500m"
      memory: "512Mi"
    max:
      cpu: "2"
      memory: "2Gi"
Enter fullscreen mode Exit fullscreen mode

Apply to both namespaces:

kubectl apply -n team-a -f quota-team-a.yaml
kubectl apply -n team-b -f quota-team-a.yaml
Enter fullscreen mode Exit fullscreen mode

Deploy a tiny app and a “db” in team-a

app-db.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: db
  labels: { app: db }
spec:
  replicas: 1
  selector: { matchLabels: { app: db } }
  template:
    metadata: { labels: { app: db } }
    spec:
      containers:
      - name: db
        image: postgres:16-alpine
        env:
        - { name: POSTGRES_PASSWORD, value: dev }
        ports:
        - { containerPort: 5432, name: pg }
---
apiVersion: v1
kind: Service
metadata:
  name: db
  labels: { app: db }
spec:
  selector: { app: db }
  ports:
  - { port: 5432, targetPort: pg, protocol: TCP }
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
  labels: { app: web }
spec:
  replicas: 1
  selector: { matchLabels: { app: web } }
  template:
    metadata: { labels: { app: web } }
    spec:
      containers:
      - name: web
        image: curlimages/curl:8.10.1
        command: ["sleep","infinity"]
Enter fullscreen mode Exit fullscreen mode
kubectl apply -n team-a -f app-db.yaml
kubectl wait -n team-a --for=condition=available deploy/web deploy/db --timeout=90s
Enter fullscreen mode Exit fullscreen mode

Smoke test (pre-policy, should connect):

POD=$(kubectl -n team-a get pod -l app=web -o jsonpath='{.items[0].metadata.name}')
kubectl -n team-a exec -it "$POD" -- sh -lc 'curl -m2 db:5432 || true'
# Expect: connection established or at least a TCP handshake banner (not blocked)
Enter fullscreen mode Exit fullscreen mode

Enforce default-deny (ingress+egress) in team-a

np-default-deny.yaml

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
spec:
  podSelector: {}        # select all pods
  policyTypes:
  - Ingress
  - Egress
Enter fullscreen mode Exit fullscreen mode
kubectl apply -n team-a -f np-default-deny.yaml
Enter fullscreen mode Exit fullscreen mode

Test (should now fail):

kubectl -n team-a exec -it "$POD" -- sh -lc 'curl -m2 db:5432 || echo BLOCKED'
# Expect: BLOCKED / timeout
Enter fullscreen mode Exit fullscreen mode

4) Allow only DNS + app→db

np-allow-dns.yaml

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns
spec:
  podSelector: { }      # all pods may resolve DNS
  policyTypes: [ Egress ]
  egress:
  - to:
    - namespaceSelector:
        matchLabels: { kubernetes.io/metadata.name: kube-system }
      podSelector:
        matchLabels: { k8s-app: kube-dns }
    ports:
    - { protocol: UDP, port: 53 }
    - { protocol: TCP, port: 53 }
Enter fullscreen mode Exit fullscreen mode

np-db-ingress-from-web.yaml

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: db-allow-from-web
spec:
  podSelector:
    matchLabels: { app: db }
  policyTypes: [ Ingress ]
  ingress:
  - from:
    - podSelector:
        matchLabels: { app: web }
    ports:
    - { protocol: TCP, port: 5432 }
Enter fullscreen mode Exit fullscreen mode

np-web-egress-to-db.yaml

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: web-allow-to-db
spec:
  podSelector:
    matchLabels: { app: web }
  policyTypes: [ Egress ]
  egress:
  - to:
    - podSelector:
        matchLabels: { app: db }
    ports:
    - { protocol: TCP, port: 5432 }
Enter fullscreen mode Exit fullscreen mode
kubectl apply -n team-a -f np-allow-dns.yaml -f np-db-ingress-from-web.yaml -f np-web-egress-to-db.yaml
kubectl -n team-a exec -it "$POD" -- sh -lc 'curl -m2 db:5432 || true'
# Expect: succeeds (policy allows only web→db and DNS egress)
Enter fullscreen mode Exit fullscreen mode

Repeat the same baseline (quotas/limits, default-deny, PSA labels) for team-b.


Diagram 2 — NetworkPolicy Traffic Matrix

Diagram-2


YAML & Bash Reference

PSA labels (namespace):

kubectl label ns team-a pod-security.kubernetes.io/enforce=restricted --overwrite
kubectl label ns team-a pod-security.kubernetes.io/{enforce, warn, audit}-version=latest --overwrite
Enter fullscreen mode Exit fullscreen mode

ResourceQuota & LimitRange:

apiVersion: v1
kind: ResourceQuota
metadata: { name: <ns>-quota }
spec:
  hard:
    pods: "30"
    requests.cpu: "8"
    requests.memory: 16Gi
    limits.cpu: "16"
    limits.memory: 32Gi
---
apiVersion: v1
kind: LimitRange
metadata: { name: <ns>-defaults }
spec:
  limits:
  - type: Container
    defaultRequest: { cpu: "200m", memory: "256Mi" }
    default:        { cpu: "500m", memory: "512Mi" }
    max:            { cpu: "2",    memory: "2Gi" }
Enter fullscreen mode Exit fullscreen mode

Default-deny + allows (template): see np-default-deny.yaml, np-allow-dns.yaml, np-db-ingress-from-web.yaml, np-web-egress-to-db.yaml above.

Namespace-scoped context (handy):

kubectl config set-context --current --namespace=team-a
# or create a named context once:
kubectl config set-context team-a --cluster=$(kubectl config current-context) --user=$(kubectl config view -o jsonpath='{.contexts[?(@.name=="'$(kubectl config current-context)'")].context.user}') --namespace=team-a
kubectl config use-context team-a
Enter fullscreen mode Exit fullscreen mode

Cheatsheet Table

Task Command / File Notes
Create namespace kubectl create ns <name> Add labels (env=prod, PSA) immediately.
Label PSA restricted kubectl label ns <name> pod-security.kubernetes.io/enforce=restricted --overwrite Add warn/audit to preview breaks.
Apply quotas/limits kubectl apply -n <ns> -f quota-<ns>.yaml Pair ResourceQuota with LimitRange.
Check quota usage kubectl describe resourcequota -n <ns> Watch Used vs Hard.
Default-deny policy kubectl apply -n <ns> -f np-default-deny.yaml Deny both Ingress and Egress.
Allow DNS kubectl apply -n <ns> -f np-allow-dns.yaml Needed for service discovery.
Allow app→db kubectl apply -n <ns> -f np-allow-app-to-db.yaml Pair with ingress on db and egress on app.
Switch namespace kubectl config set-context --current --namespace=<ns> Keeps commands short & safe.
Dry-run a spec kubectl apply -f x.yaml --dry-run=server Admission & schema checks without deploying.

Pitfalls & Recovery

  • “Policies don’t seem to apply” (order confusion).
    Symptom: You created allows but traffic still blocked.
    Why: Policies are additive; any default-deny remains in effect unless an allow matches both sides (ingress on target + egress on source if you deny both).
    Fix: Ensure you have egress allow on source and ingress allow on destination.

  • DNS broke after default-deny egress.
    Symptom: curl db or kubectl logs stalls; apps can’t resolve service names.
    Fix: Add np-allow-dns.yaml allowing egress to kube-system/k8s-app=kube-dns on TCP/UDP 53.

  • Pods rejected after enabling PSA restricted.
    Symptom: Error from server (Forbidden) with fields like runAsNonRoot.
    Fix: Adjust workload security context (non-root, drop caps, no hostPath). Temporarily set enforce=baseline and warn=restricted to migrate, then flip back.

  • Quota exceeded during rollout.
    Symptom: New ReplicaSet can’t scale (exceeded quota).
    Fix: Increase pods/requests.cpu/memory quota, or lower replicas. Keep rollout headroom (20–30%).

  • No limits → eviction during pressure.
    Symptom: Pods OOMKilled or preempted under load.
    Fix: Set defaults via LimitRange; ensure requests approximate real usage (watch kubectl top pod).

  • East-west traffic across namespaces unexpectedly blocked.
    Symptom: Cross-ns calls time out.
    Fix: Add namespaceSelector + podSelector in allow rules, or route via ingresses with clear policies.


Wrap-Up (What this unlocks for reliability—Post 2 teaser)

With Namespaces, Quotas/LimitRanges, NetworkPolicies, and PSA in place, you’ve built a security and fairness floor: teams can’t trample each other, pods can’t talk without permission, and unsafe specs are stopped at the door.

In Post 2, we’ll layer observability SLOs, PodDisruptionBudgets, health/readiness gates, and autoscaling on top of this baseline to keep releases smooth and reliability measurable.


Appendix: “Before → After” Quick Contrasts

  • Networking:
    Before: web can reach everything → lateral risk.
    After: Only web → db + DNS; all else denied.

  • Resources:
    Before: No limits; one job spikes → node thrash.
    After: Default requests/limits; quotas cap team usage.

  • Security:
    Before: Privileged pods slip through.
    After: PSA restricted blocks them with actionable errors.


Got questions or want a tailored baseline for your stack? Drop them in, and I’ll fold them into a separate post.

Top comments (0)