<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mumtaz Jahan</title>
    <description>The latest articles on DEV Community by Mumtaz Jahan (@mumtaz2029).</description>
    <link>https://dev.to/mumtaz2029</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3885280%2F719c10f5-af77-4250-9216-5dadfef8503a.jpeg</url>
      <title>DEV Community: Mumtaz Jahan</title>
      <link>https://dev.to/mumtaz2029</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mumtaz2029"/>
    <language>en</language>
    <item>
      <title>DevOps Scenario Interview Question: Deployment Failed in Production</title>
      <dc:creator>Mumtaz Jahan</dc:creator>
      <pubDate>Sat, 18 Apr 2026 14:11:02 +0000</pubDate>
      <link>https://dev.to/mumtaz2029/devops-scenario-interview-question-deployment-failed-in-production-2bii</link>
      <guid>https://dev.to/mumtaz2029/devops-scenario-interview-question-deployment-failed-in-production-2bii</guid>
      <description>&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;devops&lt;/code&gt; &lt;code&gt;kubernetes&lt;/code&gt; &lt;code&gt;cicd&lt;/code&gt; &lt;code&gt;career&lt;/code&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Scenario: Your Deployment Failed in Production. What Steps Will You Take?
&lt;/h2&gt;

&lt;p&gt;This is one of the most common &lt;strong&gt;real-world scenario questions&lt;/strong&gt; asked in DevOps interviews. Interviewers don't want textbook answers — they want to know how you think under pressure.&lt;/p&gt;

&lt;p&gt;Here's the complete answer framework.&lt;/p&gt;




&lt;h2&gt;
  
  
  Answer: Step-by-Step Approach
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Check CI/CD Pipeline Logs
&lt;/h3&gt;

&lt;p&gt;First thing — don't guess, &lt;strong&gt;read the logs&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# For Jenkins&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; /var/log/jenkins/jenkins.log

&lt;span class="c"&gt;# For GitHub Actions — check the Actions tab in your repo&lt;/span&gt;

&lt;span class="c"&gt;# For GitLab CI&lt;/span&gt;
gitlab-ci logs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pipeline log tells you exactly &lt;strong&gt;where&lt;/strong&gt; it broke.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Identify the Failed Stage (Build / Test / Deploy)
&lt;/h3&gt;

&lt;p&gt;Every pipeline has stages. Narrow it down:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Build failed?&lt;/strong&gt; → Dependency issue, Dockerfile error, compilation error&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test failed?&lt;/strong&gt; → A test caught a regression before it hit production&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploy failed?&lt;/strong&gt; → Kubernetes issue, wrong image tag, resource limits, misconfigured secrets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Knowing the stage cuts your debugging time in half.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Verify Configuration Changes
&lt;/h3&gt;

&lt;p&gt;Check what changed before the failure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check recent git commits&lt;/span&gt;
git log &lt;span class="nt"&gt;--oneline&lt;/span&gt; &lt;span class="nt"&gt;-10&lt;/span&gt;

&lt;span class="c"&gt;# Check Kubernetes config changes&lt;/span&gt;
kubectl describe deployment my-app

&lt;span class="c"&gt;# Check if secrets/configmaps were updated&lt;/span&gt;
kubectl get configmap my-app-config &lt;span class="nt"&gt;-o&lt;/span&gt; yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Most production failures trace back to a &lt;strong&gt;config change&lt;/strong&gt; someone forgot to mention.&lt;/p&gt;




&lt;h3&gt;
  
  
  4. Rollback to Previous Stable Version
&lt;/h3&gt;

&lt;p&gt;Don't try to fix forward when production is down. &lt;strong&gt;Rollback first, fix later.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Kubernetes rollback&lt;/span&gt;
kubectl rollout undo deployment/my-app

&lt;span class="c"&gt;# Verify rollback status&lt;/span&gt;
kubectl rollout status deployment/my-app

&lt;span class="c"&gt;# Check rollout history&lt;/span&gt;
kubectl rollout &lt;span class="nb"&gt;history &lt;/span&gt;deployment/my-app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This restores service immediately while you investigate the root cause safely.&lt;/p&gt;




&lt;h3&gt;
  
  
  5. Fix the Issue and Redeploy
&lt;/h3&gt;

&lt;p&gt;Once production is stable:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reproduce the issue in staging&lt;/li&gt;
&lt;li&gt;Apply the fix&lt;/li&gt;
&lt;li&gt;Test thoroughly&lt;/li&gt;
&lt;li&gt;Redeploy with the corrected version
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl &lt;span class="nb"&gt;set &lt;/span&gt;image deployment/my-app my-app&lt;span class="o"&gt;=&lt;/span&gt;my-image:v2.1-fixed
kubectl rollout status deployment/my-app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Pro Tip
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Always maintain versioned Docker images&lt;/strong&gt; — never use &lt;code&gt;latest&lt;/code&gt; in production.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Bad&lt;/span&gt;
&lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app:latest&lt;/span&gt;

&lt;span class="c1"&gt;# Good&lt;/span&gt;
&lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app:v2.0.1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without versioned images, you can't rollback. Tag every release.&lt;/p&gt;




&lt;h2&gt;
  
  
  Bonus: What Interviewers Are Really Looking For
&lt;/h2&gt;

&lt;p&gt;They want to see that you: don't panic, prioritize restoring service over finding blame, think in structured steps, and know the actual commands — not just theory.&lt;/p&gt;




&lt;p&gt;*Preparing for a DevOps interview? Drop your toughest scenario question in the comments *&lt;/p&gt;




</description>
      <category>career</category>
      <category>cicd</category>
      <category>devops</category>
      <category>interview</category>
    </item>
    <item>
      <title>How to Fixed a Kubernetes CrashLoopBackOff in Production</title>
      <dc:creator>Mumtaz Jahan</dc:creator>
      <pubDate>Fri, 17 Apr 2026 23:55:42 +0000</pubDate>
      <link>https://dev.to/mumtaz2029/how-to-fixed-a-kubernetes-crashloopbackoff-in-production-232f</link>
      <guid>https://dev.to/mumtaz2029/how-to-fixed-a-kubernetes-crashloopbackoff-in-production-232f</guid>
      <description>&lt;p&gt;&lt;em&gt;Tags:&lt;/em&gt; &lt;code&gt;kubernetes&lt;/code&gt; &lt;code&gt;devops&lt;/code&gt; &lt;code&gt;debugging&lt;/code&gt; &lt;code&gt;cloud&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem: Application Was DOWN in Kubernetes
&lt;/h2&gt;

&lt;p&gt;One of the most stressful moments in DevOps — you check your monitoring dashboard and your application is completely &lt;strong&gt;DOWN&lt;/strong&gt; in Kubernetes. No graceful degradation. Just... down.&lt;/p&gt;

&lt;p&gt;Here's exactly how I diagnosed and fixed it in under an hour.&lt;/p&gt;




&lt;h2&gt;
  
  
  Issue Found: Pod Was in CrashLoopBackOff
&lt;/h2&gt;

&lt;p&gt;Running &lt;code&gt;kubectl get pods&lt;/code&gt; revealed the culprit immediately:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;NAME                        READY   STATUS             RESTARTS   AGE
my-app-7d9f8b6c4-xk2pq     0/1     CrashLoopBackOff   8          20m
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;CrashLoopBackOff&lt;/code&gt; means Kubernetes is repeatedly trying to start your container, it crashes, and Kubernetes backs off with increasing wait times before retrying. Something inside the container was causing it to exit immediately on startup.&lt;/p&gt;




&lt;h2&gt;
  
  
  Debug Steps
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Checked Logs (&lt;code&gt;kubectl logs&lt;/code&gt;)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl logs my-app-7d9f8b6c4-xk2pq &lt;span class="nt"&gt;--previous&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--previous&lt;/code&gt; flag is crucial here — it lets you see logs from the &lt;em&gt;crashed&lt;/em&gt; container, not the current (possibly empty) one. The logs showed repeated connection errors on startup.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Checked Config
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl describe pod my-app-7d9f8b6c4-xk2pq
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I inspected the environment variables and ConfigMaps attached to the pod. The &lt;code&gt;describe&lt;/code&gt; command is a goldmine — it shows events, resource limits, volume mounts, and more.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Found DB Connection Issue
&lt;/h3&gt;

&lt;p&gt;The logs made it clear: the app was trying to connect to the database using an &lt;strong&gt;incorrect connection string&lt;/strong&gt;. The host value in the environment variable was pointing to a stale endpoint. The app would crash immediately on boot since it couldn't reach the DB.&lt;/p&gt;




&lt;h2&gt;
  
  
  Fix Applied
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Corrected Environment Variables
&lt;/h3&gt;

&lt;p&gt;Updated the Kubernetes secret/configmap with the correct database host:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl edit secret my-app-db-secret
&lt;span class="c"&gt;# or&lt;/span&gt;
kubectl &lt;span class="nb"&gt;set env &lt;/span&gt;deployment/my-app &lt;span class="nv"&gt;DB_HOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;correct-db-host.internal
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Restarted the Deployment
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl rollout restart deployment/my-app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then watched the rollout:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl rollout status deployment/my-app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🎉 Result
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; Application UP
 Issue resolved
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pods came up healthy, readiness probes passed, and traffic started flowing again.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Always check logs with &lt;code&gt;--previous&lt;/code&gt;&lt;/strong&gt; — the live container may have no logs if it crashes before writing any.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;kubectl describe pod&lt;/code&gt;&lt;/strong&gt; is your best friend for seeing the full picture: events, env vars, resource pressure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CrashLoopBackOff is almost always one of:&lt;/strong&gt; bad env vars/secrets, missing config, OOM kill, or a bug triggered at startup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;kubectl rollout restart&lt;/code&gt;&lt;/strong&gt; is safer than deleting pods manually — it does a rolling restart with zero downtime.&lt;/p&gt;




&lt;p&gt;*Hit a similar issue? Drop your debugging story in the comments *&lt;/p&gt;




</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>sre</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
