Mumtaz Jahan

Posted on May 3

Kubernetes Rolling Update Failed — Here's Exactly What to Do

#kubernetes #devops #beginners #career

Kubernetes Rolling Update Failed — Here's Exactly What to Do

One of the most common DevOps interview scenario questions:

"Your deployment rollout failed in Kubernetes. What will you do?"

Most beginners panic at this question. Senior engineers don't — because they have a clear mental framework for it.

Here is that exact framework.

Practical Answer

The priority order is everything here:

First, ensure service stability. Then analyze why the rollout failed.

Never do it the other way around. Production availability comes before investigation — always.

Step-by-Step Debug Process

Step 1: Check Rollout Status

First thing — understand exactly where the rollout stopped:

kubectl rollout status deployment/<name>

This tells you whether the rollout is still progressing, stuck, or has failed completely.

Step 2: Check Events

kubectl describe deployment <name>

Scroll to the Events section at the bottom. This is where Kubernetes tells you exactly what went wrong.

Look for:

🔴 Probe failures — liveness or readiness probe not passing
🔴 Image errors — wrong image tag, image pull failure, registry issue
🔴 Crash issues — container starting and immediately crashing

Step 3: Check New Pods

kubectl get pods

kubectl logs <new-pod-name>

The new pods created during the rolling update are where the failure lives. Check their status and read their logs — the error will almost always be visible here.

Step 4: Immediate Fix — VERY IMPORTANT

If production is affected — rollback first. Investigate later.

kubectl rollout undo deployment/<name>

This instantly restores the previous stable version and brings your service back up. Users stop seeing errors. Now you have time to debug safely without pressure.

# Verify rollback completed successfully
kubectl rollout status deployment/<name>

Step 5: Find the Root Cause

Now that service is restored, investigate calmly. The most common root causes are:

Liveness/Readiness Probe Wrong — The probe is hitting the wrong path or port, causing Kubernetes to think the pod is unhealthy and kill it during rollout.

New Image Bug — The new Docker image has a startup bug or crash that only appears at runtime, not during build.

Config Issue — Wrong environment variable, missing secret, or incorrect ConfigMap value that the new version depends on but the old version didn't.

Step 6: Fix and Redeploy

Once root cause is identified — fix it, test it in staging, then redeploy:

# After fixing the issue
kubectl set image deployment/<name> <container>=<new-fixed-image>

# Watch the new rollout
kubectl rollout status deployment/<name>

Strong Interview Line

Say this in your interview and the interviewer will remember you:

"I always rollback first to maintain availability, then debug the failed rollout."

This one sentence shows you understand that service availability is non-negotiable — and that investigation can wait until users are no longer affected.

That is the mindset of a senior engineer.

Quick Reference Checklist

Step	Command	Purpose
1. Check rollout	`kubectl rollout status deployment/<name>`	See where it failed
2. Check events	`kubectl describe deployment <name>`	Read failure reason
3. Check pods	`kubectl get pods`	Find failing new pods
4. Read logs	`kubectl logs <new-pod>`	See exact error
5. Rollback	`kubectl rollout undo deployment/<name>`	Restore service NOW
6. Fix & redeploy	`kubectl set image ...`	After root cause found

Most Common Root Causes

Root Cause	What to Check
Probe failure	Liveness/readiness path and port
Image error	Image tag exists in registry
Config issue	Env vars, Secrets, ConfigMaps

Final Thought

A failed rolling update is not a disaster — it is a process.

Rollback first. Service is restored. Now you have all the time you need to debug properly, find the root cause, and redeploy with confidence.

The engineers who stay calm during rollout failures are the ones who have this process memorized.

*Have you ever had a rolling update fail in production? What was the root cause? Drop it in the comments *

DEV Community

Kubernetes Rolling Update Failed — Here's Exactly What to Do

Kubernetes Rolling Update Failed — Here's Exactly What to Do

Practical Answer

Step-by-Step Debug Process

Step 1: Check Rollout Status

Step 2: Check Events

Step 3: Check New Pods

Step 4: Immediate Fix — VERY IMPORTANT

Step 5: Find the Root Cause

Step 6: Fix and Redeploy

Strong Interview Line

Quick Reference Checklist

Most Common Root Causes

Final Thought

Top comments (0)