Tags: devops kubernetes cicd career
Scenario: Your Deployment Failed in Production. What Steps Will You Take?
This is one of the most common real-world scenario questions asked in DevOps interviews. Interviewers don't want textbook answers — they want to know how you think under pressure.
Here's the complete answer framework.
Answer: Step-by-Step Approach
1. Check CI/CD Pipeline Logs
First thing — don't guess, read the logs.
# For Jenkins
cat /var/log/jenkins/jenkins.log
# For GitHub Actions — check the Actions tab in your repo
# For GitLab CI
gitlab-ci logs
The pipeline log tells you exactly where it broke.
2. Identify the Failed Stage (Build / Test / Deploy)
Every pipeline has stages. Narrow it down:
- Build failed? → Dependency issue, Dockerfile error, compilation error
- Test failed? → A test caught a regression before it hit production
- Deploy failed? → Kubernetes issue, wrong image tag, resource limits, misconfigured secrets
Knowing the stage cuts your debugging time in half.
3. Verify Configuration Changes
Check what changed before the failure:
# Check recent git commits
git log --oneline -10
# Check Kubernetes config changes
kubectl describe deployment my-app
# Check if secrets/configmaps were updated
kubectl get configmap my-app-config -o yaml
Most production failures trace back to a config change someone forgot to mention.
4. Rollback to Previous Stable Version
Don't try to fix forward when production is down. Rollback first, fix later.
# Kubernetes rollback
kubectl rollout undo deployment/my-app
# Verify rollback status
kubectl rollout status deployment/my-app
# Check rollout history
kubectl rollout history deployment/my-app
This restores service immediately while you investigate the root cause safely.
5. Fix the Issue and Redeploy
Once production is stable:
- Reproduce the issue in staging
- Apply the fix
- Test thoroughly
- Redeploy with the corrected version
kubectl set image deployment/my-app my-app=my-image:v2.1-fixed
kubectl rollout status deployment/my-app
Pro Tip
Always maintain versioned Docker images — never use latest in production.
# Bad
image: my-app:latest
# Good
image: my-app:v2.0.1
Without versioned images, you can't rollback. Tag every release.
Bonus: What Interviewers Are Really Looking For
They want to see that you: don't panic, prioritize restoring service over finding blame, think in structured steps, and know the actual commands — not just theory.
*Preparing for a DevOps interview? Drop your toughest scenario question in the comments *
Top comments (0)