Kubernetes in Production: Deployments, Scaling, and Troubleshooting the Right Way
So you've got Kubernetes running locally. Maybe you've even deployed a few services to a staging cluster. But production is a different beast — and most tutorials stop right before things get real.
This article covers what actually matters when running Kubernetes in production: reliable deployments, smart scaling, and debugging when things go wrong (because they will).
1. Deployments: Ship Safely Every Time
Use Rolling Updates with Sensible Defaults
Kubernetes rolls out updates by default, but the defaults aren't always production-safe. Always set these explicitly:
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
maxUnavailable: 0 ensures no pod is terminated before a healthy replacement is running. This is the single most impactful change you can make to reduce deployment-related downtime.
Set Readiness and Liveness Probes
Without probes, Kubernetes assumes a pod is ready the moment it starts. That's almost never true.
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
- Readiness probe: controls when traffic is sent to the pod
- Liveness probe: restarts the pod if it's stuck or deadlocked
If you only implement one thing from this article, make it readiness probes.
Always Set Resource Requests and Limits
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
Without requests, the scheduler can't make good placement decisions. Without limits, a single misbehaving pod can starve its neighbors. Both will cause you pain in production.
2. Scaling: Handle Traffic Without Drama
Horizontal Pod Autoscaler (HPA)
HPA scales your pods based on CPU, memory, or custom metrics.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
A few rules of thumb:
-
Never set
minReplicas: 1for production workloads — you lose high availability - Target 60–70% CPU utilization, not 80%+. You want headroom before the next scale event kicks in
- Give HPA time to stabilize — avoid tuning it based on a single traffic spike
Cluster Autoscaler
HPA scales pods; Cluster Autoscaler scales nodes. Use both together.
When HPA adds pods and there's no room on existing nodes, Cluster Autoscaler provisions new nodes automatically. When load drops, it removes underutilized nodes to cut costs.
Key config tip: set --scale-down-utilization-threshold=0.5 to avoid aggressive scale-downs that can disrupt workloads.
Pod Disruption Budgets (PDBs)
PDBs protect your app during node maintenance or autoscaling events:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-app-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: my-app
This tells Kubernetes: "Never take down more pods than would leave fewer than 2 running." Without a PDB, rolling node upgrades can silently take down your entire service.
3. Troubleshooting: Debug Like a Pro
Here's a systematic approach when something breaks in production.
Step 1 — Check Pod Status
kubectl get pods -n <namespace>
kubectl describe pod <pod-name> -n <namespace>
Look at the Events section at the bottom of describe output first. It tells you exactly what Kubernetes tried to do and where it failed.
Common states and what they mean:
| Status | Likely Cause |
|---|---|
CrashLoopBackOff |
App is crashing on startup — check logs |
Pending |
No node can schedule the pod — check resource requests or taints |
OOMKilled |
Memory limit too low — increase limits or fix a memory leak |
ImagePullBackOff |
Wrong image name/tag or missing registry credentials |
Step 2 — Read the Logs
# Current logs
kubectl logs <pod-name> -n <namespace>
# Previous container instance (if crashing)
kubectl logs <pod-name> -n <namespace> --previous
# Follow live logs
kubectl logs -f <pod-name> -n <namespace>
The --previous flag is critical for CrashLoopBackOff — it shows you logs from the crashed container, not the restarted one.
Step 3 — Exec Into the Pod
When logs aren't enough:
kubectl exec -it <pod-name> -n <namespace> -- /bin/sh
From inside the pod you can test DNS resolution, check environment variables, curl internal services, and verify file mounts — all in the actual runtime environment.
Step 4 — Check Events Cluster-Wide
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
This is often overlooked but invaluable. Node pressure, failed mounts, scheduler failures — all show up here.
Step 5 — Inspect Resource Pressure
kubectl top nodes
kubectl top pods -n <namespace>
If nodes are under memory or CPU pressure, they'll start evicting pods. This can look like random pod restarts when the real problem is a noisy neighbor.
Quick Reference Checklist
Before any production deployment, verify:
- [ ] Readiness and liveness probes are configured
- [ ] Resource requests and limits are set
- [ ]
maxUnavailable: 0in rolling update strategy - [ ] HPA is configured with
minReplicas >= 2 - [ ] Pod Disruption Budget exists for critical services
- [ ] Image tags are pinned (never use
:latestin production)
Final Thought
Most Kubernetes outages aren't caused by Kubernetes itself — they're caused by missing probes, absent resource limits, or no disruption budgets. The cluster is doing exactly what it's configured to do. Production-readiness is about closing those gaps before traffic finds them for you.
Got questions or war stories from your own clusters? Drop them in the comments.

Top comments (0)