DEV Community

Jyothi Kumar
Jyothi Kumar

Posted on

Kubernetes in Production:

Kubernetes in Production: Deployments, Scaling, and Troubleshooting the Right Way

So you've got Kubernetes running locally. Maybe you've even deployed a few services to a staging cluster. But production is a different beast — and most tutorials stop right before things get real.

This article covers what actually matters when running Kubernetes in production: reliable deployments, smart scaling, and debugging when things go wrong (because they will).


1. Deployments: Ship Safely Every Time

Use Rolling Updates with Sensible Defaults

Kubernetes rolls out updates by default, but the defaults aren't always production-safe. Always set these explicitly:

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
Enter fullscreen mode Exit fullscreen mode

maxUnavailable: 0 ensures no pod is terminated before a healthy replacement is running. This is the single most impactful change you can make to reduce deployment-related downtime.

Set Readiness and Liveness Probes

Without probes, Kubernetes assumes a pod is ready the moment it starts. That's almost never true.

readinessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 15
  periodSeconds: 20
Enter fullscreen mode Exit fullscreen mode
  • Readiness probe: controls when traffic is sent to the pod
  • Liveness probe: restarts the pod if it's stuck or deadlocked

If you only implement one thing from this article, make it readiness probes.

Always Set Resource Requests and Limits

resources:
  requests:
    cpu: "250m"
    memory: "256Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"
Enter fullscreen mode Exit fullscreen mode

Without requests, the scheduler can't make good placement decisions. Without limits, a single misbehaving pod can starve its neighbors. Both will cause you pain in production.


2. Scaling: Handle Traffic Without Drama

Horizontal Pod Autoscaler (HPA)

HPA scales your pods based on CPU, memory, or custom metrics.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60
Enter fullscreen mode Exit fullscreen mode

A few rules of thumb:

  • Never set minReplicas: 1 for production workloads — you lose high availability
  • Target 60–70% CPU utilization, not 80%+. You want headroom before the next scale event kicks in
  • Give HPA time to stabilize — avoid tuning it based on a single traffic spike

Cluster Autoscaler

HPA scales pods; Cluster Autoscaler scales nodes. Use both together.

When HPA adds pods and there's no room on existing nodes, Cluster Autoscaler provisions new nodes automatically. When load drops, it removes underutilized nodes to cut costs.

Key config tip: set --scale-down-utilization-threshold=0.5 to avoid aggressive scale-downs that can disrupt workloads.

Pod Disruption Budgets (PDBs)

PDBs protect your app during node maintenance or autoscaling events:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: my-app
Enter fullscreen mode Exit fullscreen mode

This tells Kubernetes: "Never take down more pods than would leave fewer than 2 running." Without a PDB, rolling node upgrades can silently take down your entire service.


3. Troubleshooting: Debug Like a Pro

Here's a systematic approach when something breaks in production.

Step 1 — Check Pod Status

kubectl get pods -n <namespace>
kubectl describe pod <pod-name> -n <namespace>
Enter fullscreen mode Exit fullscreen mode

Look at the Events section at the bottom of describe output first. It tells you exactly what Kubernetes tried to do and where it failed.

Common states and what they mean:

Status Likely Cause
CrashLoopBackOff App is crashing on startup — check logs
Pending No node can schedule the pod — check resource requests or taints
OOMKilled Memory limit too low — increase limits or fix a memory leak
ImagePullBackOff Wrong image name/tag or missing registry credentials

Step 2 — Read the Logs

# Current logs
kubectl logs <pod-name> -n <namespace>

# Previous container instance (if crashing)
kubectl logs <pod-name> -n <namespace> --previous

# Follow live logs
kubectl logs -f <pod-name> -n <namespace>
Enter fullscreen mode Exit fullscreen mode

The --previous flag is critical for CrashLoopBackOff — it shows you logs from the crashed container, not the restarted one.

Step 3 — Exec Into the Pod

When logs aren't enough:

kubectl exec -it <pod-name> -n <namespace> -- /bin/sh
Enter fullscreen mode Exit fullscreen mode

From inside the pod you can test DNS resolution, check environment variables, curl internal services, and verify file mounts — all in the actual runtime environment.

Step 4 — Check Events Cluster-Wide

kubectl get events -n <namespace> --sort-by='.lastTimestamp'
Enter fullscreen mode Exit fullscreen mode

This is often overlooked but invaluable. Node pressure, failed mounts, scheduler failures — all show up here.

Step 5 — Inspect Resource Pressure

kubectl top nodes
kubectl top pods -n <namespace>
Enter fullscreen mode Exit fullscreen mode

If nodes are under memory or CPU pressure, they'll start evicting pods. This can look like random pod restarts when the real problem is a noisy neighbor.


Quick Reference Checklist

Before any production deployment, verify:

  • [ ] Readiness and liveness probes are configured
  • [ ] Resource requests and limits are set
  • [ ] maxUnavailable: 0 in rolling update strategy
  • [ ] HPA is configured with minReplicas >= 2
  • [ ] Pod Disruption Budget exists for critical services
  • [ ] Image tags are pinned (never use :latest in production)

Final Thought

Most Kubernetes outages aren't caused by Kubernetes itself — they're caused by missing probes, absent resource limits, or no disruption budgets. The cluster is doing exactly what it's configured to do. Production-readiness is about closing those gaps before traffic finds them for you.

Got questions or war stories from your own clusters? Drop them in the comments.

Top comments (0)