DEV Community

Mumtaz Jahan
Mumtaz Jahan

Posted on

DevOps Scenario Interview Question: Deployment Failed in Production

Tags: devops kubernetes cicd career


Scenario: Your Deployment Failed in Production. What Steps Will You Take?

This is one of the most common real-world scenario questions asked in DevOps interviews. Interviewers don't want textbook answers — they want to know how you think under pressure.

Here's the complete answer framework.


Answer: Step-by-Step Approach

1. Check CI/CD Pipeline Logs

First thing — don't guess, read the logs.

# For Jenkins
cat /var/log/jenkins/jenkins.log

# For GitHub Actions — check the Actions tab in your repo

# For GitLab CI
gitlab-ci logs
Enter fullscreen mode Exit fullscreen mode

The pipeline log tells you exactly where it broke.


2. Identify the Failed Stage (Build / Test / Deploy)

Every pipeline has stages. Narrow it down:

  • Build failed? → Dependency issue, Dockerfile error, compilation error
  • Test failed? → A test caught a regression before it hit production
  • Deploy failed? → Kubernetes issue, wrong image tag, resource limits, misconfigured secrets

Knowing the stage cuts your debugging time in half.


3. Verify Configuration Changes

Check what changed before the failure:

# Check recent git commits
git log --oneline -10

# Check Kubernetes config changes
kubectl describe deployment my-app

# Check if secrets/configmaps were updated
kubectl get configmap my-app-config -o yaml
Enter fullscreen mode Exit fullscreen mode

Most production failures trace back to a config change someone forgot to mention.


4. Rollback to Previous Stable Version

Don't try to fix forward when production is down. Rollback first, fix later.

# Kubernetes rollback
kubectl rollout undo deployment/my-app

# Verify rollback status
kubectl rollout status deployment/my-app

# Check rollout history
kubectl rollout history deployment/my-app
Enter fullscreen mode Exit fullscreen mode

This restores service immediately while you investigate the root cause safely.


5. Fix the Issue and Redeploy

Once production is stable:

  1. Reproduce the issue in staging
  2. Apply the fix
  3. Test thoroughly
  4. Redeploy with the corrected version
kubectl set image deployment/my-app my-app=my-image:v2.1-fixed
kubectl rollout status deployment/my-app
Enter fullscreen mode Exit fullscreen mode

Pro Tip

Always maintain versioned Docker images — never use latest in production.

# Bad
image: my-app:latest

# Good
image: my-app:v2.0.1
Enter fullscreen mode Exit fullscreen mode

Without versioned images, you can't rollback. Tag every release.


Bonus: What Interviewers Are Really Looking For

They want to see that you: don't panic, prioritize restoring service over finding blame, think in structured steps, and know the actual commands — not just theory.


*Preparing for a DevOps interview? Drop your toughest scenario question in the comments *


Top comments (0)