<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mumtaz Jahan</title>
    <description>The latest articles on DEV Community by Mumtaz Jahan (@mumtaz2029).</description>
    <link>https://dev.to/mumtaz2029</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3885280%2F719c10f5-af77-4250-9216-5dadfef8503a.jpeg</url>
      <title>DEV Community: Mumtaz Jahan</title>
      <link>https://dev.to/mumtaz2029</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mumtaz2029"/>
    <language>en</language>
    <item>
      <title>Kubernetes Rolling Update Failed — Here's Exactly What to Do</title>
      <dc:creator>Mumtaz Jahan</dc:creator>
      <pubDate>Sun, 03 May 2026 00:54:53 +0000</pubDate>
      <link>https://dev.to/mumtaz2029/kubernetes-rolling-update-failed-heres-exactly-what-to-do-3p04</link>
      <guid>https://dev.to/mumtaz2029/kubernetes-rolling-update-failed-heres-exactly-what-to-do-3p04</guid>
      <description>&lt;h2&gt;
  
  
  Kubernetes Rolling Update Failed — Here's Exactly What to Do
&lt;/h2&gt;

&lt;p&gt;One of the most common &lt;strong&gt;DevOps interview scenario questions:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Your deployment rollout failed in Kubernetes. What will you do?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Most beginners panic at this question. Senior engineers don't — because they have a clear mental framework for it.&lt;/p&gt;

&lt;p&gt;Here is that exact framework.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Answer
&lt;/h2&gt;

&lt;p&gt;The priority order is everything here:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;First, ensure service stability. Then analyze why the rollout failed.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Never do it the other way around. Production availability comes before investigation — always.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step-by-Step Debug Process
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Check Rollout Status
&lt;/h3&gt;

&lt;p&gt;First thing — understand exactly where the rollout stopped:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl rollout status deployment/&amp;lt;name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tells you whether the rollout is still progressing, stuck, or has failed completely.&lt;/p&gt;




&lt;h3&gt;
  
  
  Step 2: Check Events
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl describe deployment &amp;lt;name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Scroll to the &lt;strong&gt;Events&lt;/strong&gt; section at the bottom. This is where Kubernetes tells you exactly what went wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Look for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔴 Probe failures — liveness or readiness probe not passing&lt;/li&gt;
&lt;li&gt;🔴 Image errors — wrong image tag, image pull failure, registry issue&lt;/li&gt;
&lt;li&gt;🔴 Crash issues — container starting and immediately crashing&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Step 3: Check New Pods
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get pods

kubectl logs &amp;lt;new-pod-name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The new pods created during the rolling update are where the failure lives. Check their status and read their logs — the error will almost always be visible here.&lt;/p&gt;




&lt;h3&gt;
  
  
  Step 4: Immediate Fix — VERY IMPORTANT
&lt;/h3&gt;

&lt;p&gt;If production is affected — &lt;strong&gt;rollback first. Investigate later.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl rollout undo deployment/&amp;lt;name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This instantly restores the previous stable version and brings your service back up. Users stop seeing errors. Now you have time to debug safely without pressure.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Verify rollback completed successfully&lt;/span&gt;
kubectl rollout status deployment/&amp;lt;name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Step 5: Find the Root Cause
&lt;/h3&gt;

&lt;p&gt;Now that service is restored, investigate calmly. The most common root causes are:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Liveness/Readiness Probe Wrong&lt;/strong&gt; — The probe is hitting the wrong path or port, causing Kubernetes to think the pod is unhealthy and kill it during rollout.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;New Image Bug&lt;/strong&gt; — The new Docker image has a startup bug or crash that only appears at runtime, not during build.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Config Issue&lt;/strong&gt; — Wrong environment variable, missing secret, or incorrect ConfigMap value that the new version depends on but the old version didn't.&lt;/p&gt;




&lt;h3&gt;
  
  
  Step 6: Fix and Redeploy
&lt;/h3&gt;

&lt;p&gt;Once root cause is identified — fix it, test it in staging, then redeploy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# After fixing the issue&lt;/span&gt;
kubectl &lt;span class="nb"&gt;set &lt;/span&gt;image deployment/&amp;lt;name&amp;gt; &amp;lt;container&amp;gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;new-fixed-image&amp;gt;

&lt;span class="c"&gt;# Watch the new rollout&lt;/span&gt;
kubectl rollout status deployment/&amp;lt;name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Strong Interview Line
&lt;/h2&gt;

&lt;p&gt;Say this in your interview and the interviewer will remember you:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;"I always rollback first to maintain availability, then debug the failed rollout."&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This one sentence shows you understand that &lt;strong&gt;service availability is non-negotiable&lt;/strong&gt; — and that investigation can wait until users are no longer affected.&lt;/p&gt;

&lt;p&gt;That is the mindset of a senior engineer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Reference Checklist
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1. Check rollout&lt;/td&gt;
&lt;td&gt;&lt;code&gt;kubectl rollout status deployment/&amp;lt;name&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;See where it failed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2. Check events&lt;/td&gt;
&lt;td&gt;&lt;code&gt;kubectl describe deployment &amp;lt;name&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Read failure reason&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3. Check pods&lt;/td&gt;
&lt;td&gt;&lt;code&gt;kubectl get pods&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Find failing new pods&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4. Read logs&lt;/td&gt;
&lt;td&gt;&lt;code&gt;kubectl logs &amp;lt;new-pod&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;See exact error&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5. Rollback&lt;/td&gt;
&lt;td&gt;&lt;code&gt;kubectl rollout undo deployment/&amp;lt;name&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Restore service NOW&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6. Fix &amp;amp; redeploy&lt;/td&gt;
&lt;td&gt;&lt;code&gt;kubectl set image ...&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;After root cause found&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Most Common Root Causes
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Root Cause&lt;/th&gt;
&lt;th&gt;What to Check&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Probe failure&lt;/td&gt;
&lt;td&gt;Liveness/readiness path and port&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Image error&lt;/td&gt;
&lt;td&gt;Image tag exists in registry&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Config issue&lt;/td&gt;
&lt;td&gt;Env vars, Secrets, ConfigMaps&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;A failed rolling update is not a disaster — it is a process.&lt;/p&gt;

&lt;p&gt;Rollback first. Service is restored. Now you have all the time you need to debug properly, find the root cause, and redeploy with confidence.&lt;/p&gt;

&lt;p&gt;The engineers who stay calm during rollout failures are the ones who have this process memorized.&lt;/p&gt;




&lt;p&gt;*Have you ever had a rolling update fail in production? What was the root cause? Drop it in the comments *&lt;/p&gt;




</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>beginners</category>
      <category>career</category>
    </item>
    <item>
      <title>What Are Quality Gates in CI/CD? (And Why "Nobody Reads" Is Not a Gate)</title>
      <dc:creator>Mumtaz Jahan</dc:creator>
      <pubDate>Wed, 29 Apr 2026 19:52:48 +0000</pubDate>
      <link>https://dev.to/mumtaz2029/what-are-quality-gates-in-cicd-and-why-nobody-reads-is-not-a-gate-4a51</link>
      <guid>https://dev.to/mumtaz2029/what-are-quality-gates-in-cicd-and-why-nobody-reads-is-not-a-gate-4a51</guid>
      <description>&lt;h3&gt;
  
  
  What Are Quality Gates in CI/CD?
&lt;/h3&gt;

&lt;p&gt;A quality gate is a rule that must pass for the pipeline to move to the next stage.&lt;br&gt;
Simple definition. Powerful concept.&lt;br&gt;
If the gate fails — the pipeline fails. No exceptions. No "we'll fix it later." That discipline is exactly what keeps bugs out of production.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common Quality Gates
&lt;/h3&gt;

&lt;p&gt;Here are the most widely used gates in real DevOps pipelines:&lt;br&gt;
✅ Unit test pass rate — 100%&lt;br&gt;
✅ Code coverage — at least 70%&lt;br&gt;
✅ Static analysis — 0 critical issues&lt;br&gt;
✅ Security scan — no high severity CVEs&lt;br&gt;
✅ Smoke test — all must pass&lt;br&gt;
✅ Performance — response time must be under target (p99 threshold)&lt;br&gt;
Each of these is a hard stop. The pipeline does not move forward until every gate passes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Rule to Remember in Interviews
&lt;/h2&gt;

&lt;p&gt;A warning nobody reads is not a gate.&lt;/p&gt;

&lt;p&gt;This is the most important thing to say when asked about quality gates in an interview. If your pipeline warns but still deploys — that is not a gate. That is noise.&lt;br&gt;
A real gate blocks the pipeline. It forces the team to fix the issue before moving forward.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real Project Example You Can Use in Interviews
&lt;/h3&gt;

&lt;p&gt;Here is a real scenario worth sharing:&lt;br&gt;
Our pipeline had a 70% code coverage gate. The dev team pushed to drop it to 60% to move faster.&lt;br&gt;
Before agreeing, I pulled quarterly bug data. The finding was clear — low coverage modules had 3x more bugs.&lt;br&gt;
The data made the decision. The gate stayed at 70.&lt;br&gt;
This is a perfect interview answer because it shows you don't just follow rules blindly — you back decisions with data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Close Your Interview Answer With This Line
&lt;/h3&gt;

&lt;p&gt;Interviewers remember candidates who say this:&lt;/p&gt;

&lt;p&gt;"Gates should enforce standards that the team agreed on — not personal preferences."&lt;/p&gt;

&lt;p&gt;That one sentence shows maturity, team thinking, and real engineering judgment.&lt;/p&gt;

&lt;h1&gt;
  
  
  Real World Gate Stack
&lt;/h1&gt;

&lt;p&gt;In my last project we used:&lt;/p&gt;

&lt;p&gt;SonarQube — static analysis + code coverage gate&lt;br&gt;
OWASP Dependency Check — security vulnerability gate&lt;/p&gt;

&lt;p&gt;Any one of them failing blocked the merge entirely.&lt;br&gt;
That discipline before production is exactly why we caught bugs early instead of firefighting at 2AM.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Summary
&lt;/h2&gt;

&lt;p&gt;Gate TypeExample ThresholdUnit Tests100% pass rateCode Coverage≥ 70%Static Analysis0 critical issuesSecurity ScanNo high CVEsSmoke TestsAll passingPerformanceUnder p99 target&lt;/p&gt;

&lt;h3&gt;
  
  
  Final Thought
&lt;/h3&gt;

&lt;p&gt;Quality gates are not bureaucracy. They are the team's agreed standards made automatic.&lt;br&gt;
Without gates, standards are just suggestions. With gates, they are enforced every single time — whether it's 10AM on a Monday or 2AM before a release.&lt;br&gt;
Set the gates. Trust the gates. Let the data defend the gates.&lt;/p&gt;

&lt;p&gt;What quality gates does your team use? Drop them in the comments 👇&lt;/p&gt;

</description>
      <category>devops</category>
      <category>cicd</category>
      <category>beginners</category>
      <category>career</category>
    </item>
    <item>
      <title>Pipeline Success but Application Broken — Here's Why</title>
      <dc:creator>Mumtaz Jahan</dc:creator>
      <pubDate>Tue, 28 Apr 2026 03:10:16 +0000</pubDate>
      <link>https://dev.to/mumtaz2029/pipeline-success-but-application-broken-heres-why-h8o</link>
      <guid>https://dev.to/mumtaz2029/pipeline-success-but-application-broken-heres-why-h8o</guid>
      <description>&lt;h4&gt;
  
  
  Pipeline Success but Application Broken — Here's Why
&lt;/h4&gt;

&lt;p&gt;One of the most confusing moments in DevOps:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;CI/CD Pipeline — &lt;strong&gt;Passed&lt;/strong&gt;&lt;br&gt;
Application — &lt;strong&gt;Not Working&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;How can the pipeline be green but the app still be broken? This is actually one of the most common &lt;strong&gt;real interview scenario questions&lt;/strong&gt; asked in DevOps — and most beginners get it wrong.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Answer
&lt;/h2&gt;

&lt;p&gt;Here is the key concept that most people miss:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;CI/CD success only means the build and deploy succeeded — not that the application is healthy.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Think of it this way — your pipeline's job is to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Build the Docker image &lt;/li&gt;
&lt;li&gt;Push it to the registry &lt;/li&gt;
&lt;li&gt;Deploy it to Kubernetes &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But none of these steps check whether your &lt;strong&gt;application actually started correctly&lt;/strong&gt;, connected to the database, loaded the right config, or is responding to requests.&lt;/p&gt;

&lt;p&gt;The pipeline says &lt;strong&gt;"I delivered the package"&lt;/strong&gt; — not &lt;strong&gt;"the package works."&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Steps to Debug This
&lt;/h2&gt;

&lt;p&gt;When your pipeline is green but app is broken, follow these steps in order:&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Check Pods
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get pods
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Look at the STATUS column. You want &lt;code&gt;Running&lt;/code&gt; — anything else like &lt;code&gt;Error&lt;/code&gt;, &lt;code&gt;Pending&lt;/code&gt;, or &lt;code&gt;OOMKilled&lt;/code&gt; tells you something went wrong after deployment.&lt;/p&gt;




&lt;h3&gt;
  
  
  Step 2: Check Logs
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl logs &amp;lt;pod-name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is where the real story is. The app may have deployed successfully but crashed immediately on startup. The logs will tell you exactly why.&lt;/p&gt;




&lt;h3&gt;
  
  
  Step 3: Verify Environment Variables, Secrets &amp;amp; ConfigMaps
&lt;/h3&gt;

&lt;p&gt;This is the most common culprit. The pipeline deployed the right image — but the app couldn't connect to the database because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wrong DB host in environment variables&lt;/li&gt;
&lt;li&gt;Secret not mounted correctly&lt;/li&gt;
&lt;li&gt;ConfigMap pointing to staging instead of production
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check environment variables on the pod&lt;/span&gt;
kubectl &lt;span class="nb"&gt;exec&lt;/span&gt; &amp;lt;pod-name&amp;gt; &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="nb"&gt;env&lt;/span&gt;

&lt;span class="c"&gt;# Check if secret exists&lt;/span&gt;
kubectl get secret &amp;lt;secret-name&amp;gt;

&lt;span class="c"&gt;# Check ConfigMap values&lt;/span&gt;
kubectl get configmap &amp;lt;configmap-name&amp;gt; &lt;span class="nt"&gt;-o&lt;/span&gt; yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Step 4: Test the Endpoint Manually
&lt;/h3&gt;

&lt;p&gt;Don't assume the app is working — verify it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Port forward and test locally&lt;/span&gt;
kubectl port-forward &amp;lt;pod-name&amp;gt; 8080:8080

&lt;span class="c"&gt;# Then in another terminal&lt;/span&gt;
curl http://localhost:8080/health
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the health check fails, you have confirmed the app is broken despite the green pipeline.&lt;/p&gt;




&lt;h3&gt;
  
  
  Step 5: If Needed — Rollback
&lt;/h3&gt;

&lt;p&gt;If you can't fix it quickly and production is affected — rollback immediately:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl rollout undo deployment/&amp;lt;deployment-name&amp;gt;

&lt;span class="c"&gt;# Verify rollback&lt;/span&gt;
kubectl rollout status deployment/&amp;lt;deployment-name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restore service first. Investigate later.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Insight
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Most real production issues after a green pipeline = config mismatch&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Not a code bug. Not a broken image. Just a wrong environment variable, a missing secret, or a ConfigMap pointing to the wrong place.&lt;/p&gt;

&lt;p&gt;This is why experienced DevOps engineers &lt;strong&gt;always check config first&lt;/strong&gt; when the pipeline passes but the app doesn't behave.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Debug Checklist
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;What to Look For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Check pods&lt;/td&gt;
&lt;td&gt;&lt;code&gt;kubectl get pods&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;STATUS = Running&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Check logs&lt;/td&gt;
&lt;td&gt;&lt;code&gt;kubectl logs &amp;lt;pod&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Startup errors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Check env vars&lt;/td&gt;
&lt;td&gt;&lt;code&gt;kubectl exec &amp;lt;pod&amp;gt; -- env&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Correct DB/API values&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Check secrets&lt;/td&gt;
&lt;td&gt;&lt;code&gt;kubectl get secret&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Secret exists and mounted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Test endpoint&lt;/td&gt;
&lt;td&gt;&lt;code&gt;curl localhost/health&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;200 OK response&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rollback&lt;/td&gt;
&lt;td&gt;&lt;code&gt;kubectl rollout undo&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;If nothing else works&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;A green pipeline gives you &lt;strong&gt;confidence in your delivery process&lt;/strong&gt; — not a guarantee that your application is healthy. These are two very different things.&lt;/p&gt;

&lt;p&gt;Add &lt;strong&gt;smoke tests&lt;/strong&gt; and &lt;strong&gt;health checks&lt;/strong&gt; at the end of your pipeline to bridge that gap. Make your pipeline not just check if the deploy succeeded — but if the app actually &lt;strong&gt;came up healthy.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have you ever been caught off guard by a green pipeline and a broken app? Drop your story in the comments 👇&lt;/em&gt;&lt;/p&gt;




</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>cicd</category>
      <category>beginners</category>
    </item>
    <item>
      <title>What broke your CI/CD pipeline and how did you fix it?</title>
      <dc:creator>Mumtaz Jahan</dc:creator>
      <pubDate>Sat, 25 Apr 2026 15:41:35 +0000</pubDate>
      <link>https://dev.to/mumtaz2029/what-broke-your-cicd-pipeline-and-how-did-you-fix-it-2nka</link>
      <guid>https://dev.to/mumtaz2029/what-broke-your-cicd-pipeline-and-how-did-you-fix-it-2nka</guid>
      <description>&lt;p&gt;We all have that one story. 😅&lt;br&gt;
That moment where:&lt;/p&gt;

&lt;p&gt;Pipeline was green ✅&lt;br&gt;
You pushed one small change&lt;br&gt;
Everything exploded 💥&lt;/p&gt;

&lt;p&gt;For me it was forgetting to add environment variables in Jenkins — pipeline ran perfectly locally, failed completely in production. Classic.&lt;br&gt;
I want to hear your stories:&lt;br&gt;
🔴 What was the stupidest thing that broke your pipeline?&lt;br&gt;
🟡 How long did it take you to find the bug?&lt;br&gt;
🟢 How did you finally fix it?&lt;br&gt;
Drop your war stories below 👇 — the more painful the better! 😄&lt;/p&gt;

</description>
      <category>explainlikeimfive</category>
      <category>devops</category>
      <category>jenkins</category>
      <category>discuss</category>
    </item>
    <item>
      <title>CI/CD Pipeline Best Practices That Nobody Teaches You When You're Starting Out</title>
      <dc:creator>Mumtaz Jahan</dc:creator>
      <pubDate>Sat, 25 Apr 2026 15:25:13 +0000</pubDate>
      <link>https://dev.to/mumtaz2029/cicd-pipeline-best-practices-that-nobody-teaches-you-when-youre-starting-out-5fi3</link>
      <guid>https://dev.to/mumtaz2029/cicd-pipeline-best-practices-that-nobody-teaches-you-when-youre-starting-out-5fi3</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;When I first started building CI/CD pipelines, I thought it was just about automating deployments. I was wrong.&lt;/p&gt;

&lt;p&gt;After working with Jenkins, GitHub Actions, and GitLab CI — here are the real best practices I wish someone told me earlier.&lt;/p&gt;


&lt;h2&gt;
  
  
  1.  Never Store Secrets in Your Pipeline Code
&lt;/h2&gt;

&lt;p&gt;The biggest mistake beginners make:&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#  WRONG — never do this&lt;/span&gt;
docker login &lt;span class="nt"&gt;-u&lt;/span&gt; admin &lt;span class="nt"&gt;-p&lt;/span&gt; mypassword123

&lt;span class="c"&gt;#  RIGHT — use environment variables&lt;/span&gt;
docker login &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="nv"&gt;$DOCKER_USER&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="nv"&gt;$DOCKER_PASSWORD&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;Use your CI tool's &lt;strong&gt;secret manager&lt;/strong&gt; — Jenkins Credentials, GitHub Secrets, GitLab Variables. Always.&lt;/p&gt;


&lt;h2&gt;
  
  
  2.  Fail Fast — Put Quick Checks First
&lt;/h2&gt;

&lt;p&gt;Order your pipeline stages like this:&lt;/p&gt;


&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Lint → Unit Tests → Build → Integration Tests → Deploy
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;Why? If your linting fails, there's no point building. Catch errors early, save time.&lt;/p&gt;


&lt;h2&gt;
  
  
  3.  Always Build Immutable Artifacts
&lt;/h2&gt;

&lt;p&gt;Never deploy code directly. Always build a Docker image or artifact first:&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Tag with commit SHA — not just "latest"&lt;/span&gt;
docker build &lt;span class="nt"&gt;-t&lt;/span&gt; myapp:&lt;span class="nv"&gt;$GIT_COMMIT_SHA&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;
docker push myapp:&lt;span class="nv"&gt;$GIT_COMMIT_SHA&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;Using &lt;code&gt;latest&lt;/code&gt; tag is a trap — you lose traceability.&lt;/p&gt;


&lt;h2&gt;
  
  
  4.  One Pipeline Per Branch Strategy
&lt;/h2&gt;


&lt;pre class="highlight plaintext"&gt;&lt;code&gt;feature/* → lint + unit tests only
develop   → lint + tests + build + deploy to staging
main      → full pipeline + deploy to production
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;Don't run full heavy pipelines on every feature branch — wastes time and resources.&lt;/p&gt;


&lt;h2&gt;
  
  
  5.  Notifications Matter More Than You Think
&lt;/h2&gt;

&lt;p&gt;Add Slack or email alerts for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; Pipeline failure&lt;/li&gt;
&lt;li&gt; Successful production deploy&lt;/li&gt;
&lt;li&gt; Test coverage dropping below threshold&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Silent pipelines = hidden problems.&lt;/p&gt;


&lt;h2&gt;
  
  
  6.  Keep Your Pipeline as Code
&lt;/h2&gt;

&lt;p&gt;Always use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Jenkinsfile&lt;/code&gt; for Jenkins&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.github/workflows/*.yml&lt;/code&gt; for GitHub Actions&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.gitlab-ci.yml&lt;/code&gt; for GitLab&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Never configure pipelines through UI only — it can't be versioned or reviewed.&lt;/p&gt;


&lt;h2&gt;
  
  
  7. 📊 Track These Pipeline Metrics
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Why It Matters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Build duration&lt;/td&gt;
&lt;td&gt;Spot slowdowns early&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Test pass rate&lt;/td&gt;
&lt;td&gt;Catch flaky tests&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deploy frequency&lt;/td&gt;
&lt;td&gt;Measure team velocity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mean time to recovery&lt;/td&gt;
&lt;td&gt;How fast you fix failures&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;h2&gt;
  
  
  8.  Never Skip Tests to Speed Up Pipeline
&lt;/h2&gt;

&lt;p&gt;Skipping tests to go faster is like removing smoke detectors to save battery. You'll regret it in production.&lt;/p&gt;


&lt;h2&gt;
  
  
  What's Your Biggest CI/CD Struggle?
&lt;/h2&gt;

&lt;p&gt;Drop it in the comments — I read every one! &lt;/p&gt;



&lt;p&gt;&lt;em&gt;💬 P.S. I run a free Telegram community called **DevOps Materials &amp;amp; Learning Hub&lt;/em&gt;* where we share CI/CD scripts, Jenkinsfiles, pipeline templates and more. Join us here → &lt;a href="https://t.me/+YHQcSaCPd9EzMmQ1*" rel="noopener noreferrer"&gt;https://t.me/+YHQcSaCPd9EzMmQ1*&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;




</description>
      <category>devops</category>
      <category>cicd</category>
      <category>jenkins</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Why Your AWS Bill Doubled Overnight (And How to Plug the Leaks)</title>
      <dc:creator>Mumtaz Jahan</dc:creator>
      <pubDate>Fri, 24 Apr 2026 01:43:54 +0000</pubDate>
      <link>https://dev.to/mumtaz2029/why-your-aws-bill-doubled-overnight-and-how-to-plug-the-leaks-48c7</link>
      <guid>https://dev.to/mumtaz2029/why-your-aws-bill-doubled-overnight-and-how-to-plug-the-leaks-48c7</guid>
      <description>&lt;h2&gt;
  
  
  Why Your AWS Bill Doubled Overnight (And How to Plug the Leaks)
&lt;/h2&gt;

&lt;p&gt;We've all been there.&lt;/p&gt;

&lt;p&gt;You open the AWS Billing Dashboard, expecting the usual $50–$100, only to see a vertical spike that looks like a mountain range. The immediate reaction is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"We must have massive traffic!"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But let's be real — traffic rarely doubles overnight. Your misconfigurations, however, certainly can.&lt;/p&gt;

&lt;p&gt;If you're staring down a bill that's spiraling out of control, here is your emergency checklist to find the &lt;strong&gt;invisible drains&lt;/strong&gt; on your budget.&lt;/p&gt;




&lt;h2&gt;
  
  
  1.  The NAT Gateway "Processing" Trap
&lt;/h2&gt;

&lt;p&gt;NAT Gateways are the &lt;strong&gt;silent killers&lt;/strong&gt; of AWS budgets. You aren't just paying for the uptime — you're paying for every gigabyte that passes through.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Leak:&lt;/strong&gt;&lt;br&gt;
Sending high-bandwidth internal traffic (like S3 uploads) through a NAT Gateway instead of using a VPC Endpoint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Fix:&lt;/strong&gt;&lt;br&gt;
Use &lt;strong&gt;VPC Endpoints&lt;/strong&gt; for S3 and DynamoDB to keep that traffic off the expensive NAT "highway."&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check your NAT Gateway data transfer costs&lt;/span&gt;
aws ec2 describe-nat-gateways &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'NatGateways[*].{ID:NatGatewayId,State:State}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A single misconfigured service pushing gigabytes through NAT can silently add hundreds of dollars to your bill.&lt;/p&gt;




&lt;h2&gt;
  
  
  2.  Cross-AZ Data Transfer — The Invisible Tax
&lt;/h2&gt;

&lt;p&gt;High availability is great, but cross-Availability Zone (AZ) traffic comes with a literal &lt;strong&gt;invisible tax.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Leak:&lt;/strong&gt;&lt;br&gt;
Your app server in &lt;code&gt;us-east-1a&lt;/code&gt; is constantly chatting with a database in &lt;code&gt;us-east-1b&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Fix:&lt;/strong&gt;&lt;br&gt;
Keep your "chatty" services within the &lt;strong&gt;same AZ&lt;/strong&gt; where possible, or use Service Discovery to prioritize local traffic.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check which AZ your instances are running in&lt;/span&gt;
aws ec2 describe-instances &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Reservations[*].Instances[*].{ID:InstanceId,AZ:Placement.AvailabilityZone}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  3.  Ghost EBS Volumes
&lt;/h2&gt;

&lt;p&gt;When you terminate an EC2 instance, the Elastic Block Store (EBS) volume doesn't always go away with it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Leak:&lt;/strong&gt;&lt;br&gt;
"Unattached" volumes sitting in your console, doing absolutely nothing except &lt;strong&gt;costing you monthly rent.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Fix:&lt;/strong&gt;&lt;br&gt;
Go to EC2 Console → Volumes → Filter by &lt;code&gt;State = Available&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Find all unattached EBS volumes via CLI&lt;/span&gt;
aws ec2 describe-volumes &lt;span class="nt"&gt;--filters&lt;/span&gt; &lt;span class="nv"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;status,Values&lt;span class="o"&gt;=&lt;/span&gt;available &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Volumes[*].{ID:VolumeId,Size:Size,State:State}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If it's not &lt;code&gt;In-use&lt;/code&gt; — delete it or snapshot it and move on.&lt;/p&gt;




&lt;h2&gt;
  
  
  4.  Broken Auto Scaling
&lt;/h2&gt;

&lt;p&gt;Auto Scaling is designed to &lt;strong&gt;save&lt;/strong&gt; you money, but it only works if it knows how to breathe.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Leak:&lt;/strong&gt;&lt;br&gt;
Your "Scale Up" policy works perfectly during peak hours, but your "Scale Down" policy is either missing or blocked by a single stuck process.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Fix:&lt;/strong&gt;&lt;br&gt;
Audit your CloudWatch alarms. Ensure your cooldown periods aren't too long and that your termination policies are actually firing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# List your Auto Scaling groups and their activities&lt;/span&gt;
aws autoscaling describe-scaling-activities &lt;span class="nt"&gt;--auto-scaling-group-name&lt;/span&gt; your-group-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  5. The CloudWatch Ingestion Spike
&lt;/h2&gt;

&lt;p&gt;Logs are vital — until they cost more than the app they're monitoring.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Leak:&lt;/strong&gt;&lt;br&gt;
You left a service in &lt;strong&gt;Debug mode&lt;/strong&gt;, and now you're paying for terabytes of CloudWatch log ingestion.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Fix:&lt;/strong&gt;&lt;br&gt;
Set a retention policy. Don't keep logs for "Forever" by default.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set a 30-day retention policy on a log group&lt;/span&gt;
aws logs put-retention-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--log-group-name&lt;/span&gt; /your/log/group &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--retention-in-days&lt;/span&gt; 30
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;14 to 30 days is usually plenty for dev environments.&lt;/p&gt;




&lt;h2&gt;
  
  
  6.  S3 Without a Lifecycle Policy
&lt;/h2&gt;

&lt;p&gt;Storage is cheap — but it's not free.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Leak:&lt;/strong&gt;&lt;br&gt;
Storing every version of every file in Standard Storage for years with no cleanup plan.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Fix:&lt;/strong&gt;&lt;br&gt;
Implement &lt;strong&gt;S3 Lifecycle Policies&lt;/strong&gt; to move old data automatically.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Rules"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Enabled"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Transitions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Days"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"StorageClass"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"STANDARD_IA"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Days"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"StorageClass"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"GLACIER"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Move to Infrequent Access after 30 days. Move to Glacier after 90. Your future self will thank you.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. Idle Load Balancers
&lt;/h2&gt;

&lt;p&gt;An ALB (Application Load Balancer) costs roughly &lt;strong&gt;$16–$20/month&lt;/strong&gt; just to exist — even if nothing is using it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Leak:&lt;/strong&gt;&lt;br&gt;
Leftover load balancers from a project or staging environment you forgot to tear down.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Find load balancers with no targets&lt;/span&gt;
aws elbv2 describe-load-balancers &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'LoadBalancers[*].{Name:LoadBalancerName,DNS:DNSName}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If it has &lt;strong&gt;zero targets&lt;/strong&gt; and &lt;strong&gt;zero requests&lt;/strong&gt; — delete it immediately.&lt;/p&gt;




&lt;h2&gt;
  
  
  8.  Snapshot Hoarding
&lt;/h2&gt;

&lt;p&gt;Backups are important — but do you really need a snapshot of a test server from 2022?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Leak:&lt;/strong&gt;&lt;br&gt;
Automated backups that &lt;strong&gt;never expire.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Fix:&lt;/strong&gt;&lt;br&gt;
Use &lt;strong&gt;AWS Backup&lt;/strong&gt; to centralize management and set hard expiration dates on snapshots.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# List all your snapshots and their ages&lt;/span&gt;
aws ec2 describe-snapshots &lt;span class="nt"&gt;--owner-ids&lt;/span&gt; self &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Snapshots[*].{ID:SnapshotId,Size:VolumeSize,Date:StartTime}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Quick Emergency Checklist
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Leak&lt;/th&gt;
&lt;th&gt;Quick Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;NAT Gateway traffic&lt;/td&gt;
&lt;td&gt;Use VPC Endpoints&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Cross-AZ traffic&lt;/td&gt;
&lt;td&gt;Keep services in same AZ&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Ghost EBS volumes&lt;/td&gt;
&lt;td&gt;Filter Available → Delete&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Broken Auto Scaling&lt;/td&gt;
&lt;td&gt;Audit CloudWatch alarms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;CloudWatch debug logs&lt;/td&gt;
&lt;td&gt;Set 14-30 day retention&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;S3 no lifecycle&lt;/td&gt;
&lt;td&gt;Add Lifecycle Policy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Idle Load Balancers&lt;/td&gt;
&lt;td&gt;Zero targets → Delete&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Old snapshots&lt;/td&gt;
&lt;td&gt;Set expiration in AWS Backup&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;AWS is a &lt;strong&gt;"pay-for-what-you-use"&lt;/strong&gt; model — but if you aren't careful, you're also &lt;strong&gt;"paying-for-what-you-forgot-to-turn-off."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Run through this checklist every month. Set a calendar reminder. Your AWS bill will thank you. &lt;/p&gt;




&lt;p&gt;*What's the biggest hidden cost you've ever found in your AWS bill? Drop it in the comments *&lt;/p&gt;




</description>
      <category>aws</category>
      <category>devops</category>
      <category>beginners</category>
      <category>cloudcomputing</category>
    </item>
    <item>
      <title>How I Fixed Jenkins Built-In Node Offline Issue on EC2</title>
      <dc:creator>Mumtaz Jahan</dc:creator>
      <pubDate>Mon, 20 Apr 2026 11:08:28 +0000</pubDate>
      <link>https://dev.to/mumtaz2029/how-i-fixed-jenkins-built-in-node-offline-issue-on-ec2-m79</link>
      <guid>https://dev.to/mumtaz2029/how-i-fixed-jenkins-built-in-node-offline-issue-on-ec2-m79</guid>
      <description>&lt;h2&gt;
  
  
  Problem: Jenkins Built-In Node Showing Offline on EC2
&lt;/h2&gt;

&lt;p&gt;If you have ever set up Jenkins on an AWS EC2 instance and seen your Built-In Node showing &lt;strong&gt;offline&lt;/strong&gt; with builds not running — this post is for you!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffdlpf34pskcxsvcipejg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffdlpf34pskcxsvcipejg.png" alt=" " width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is exactly what happened, why it happened, and how I fixed it step by step.&lt;/p&gt;




&lt;h2&gt;
  
  
  Understanding the Warning
&lt;/h2&gt;

&lt;p&gt;When I clicked on the Built-In Node I saw this error:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Disk space is below threshold of 1.00 GiB. Only 951.90 MiB out of 956.65 MiB left on /tmp.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmf9t7i4grmvve8zqo8pb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmf9t7i4grmvve8zqo8pb.png" alt=" " width="800" height="366"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Jenkins monitors your server's system resources constantly. It requires:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;Minimum Required&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Free Disk Space&lt;/td&gt;
&lt;td&gt;≥ 1 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free Temp Space &lt;code&gt;/tmp&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;≥ 1 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free Swap Space&lt;/td&gt;
&lt;td&gt;&amp;gt; 0 B&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In my case the checks showed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; Free Disk Space: 22.26 GiB — perfectly fine&lt;/li&gt;
&lt;li&gt; Free Temp Space &lt;code&gt;/tmp&lt;/code&gt;: 951.90 MiB — &lt;strong&gt;below Jenkins threshold&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt; Free Swap Space: 0 B — &lt;strong&gt;no swap at all&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;/tmp&lt;/code&gt; partition was only &lt;strong&gt;956 MiB total&lt;/strong&gt; — just under Jenkins' 1 GB requirement. So Jenkins automatically took the Built-In Node &lt;strong&gt;offline&lt;/strong&gt; and refused to run any builds.&lt;/p&gt;




&lt;h2&gt;
  
  
  Fix Option 1 — Increase &lt;code&gt;/tmp&lt;/code&gt; Size (Best for EC2)
&lt;/h2&gt;

&lt;p&gt;This is the permanent fix. We increase the &lt;code&gt;/tmp&lt;/code&gt; mount size by editing the filesystem configuration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Open the fstab file&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;nano /etc/fstab
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2: Add this line at the bottom&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;tmpfs /tmp tmpfs defaults,size&lt;span class="o"&gt;=&lt;/span&gt;2G 0 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Save and exit (&lt;code&gt;Ctrl+X&lt;/code&gt; → &lt;code&gt;Y&lt;/code&gt; → &lt;code&gt;Enter&lt;/code&gt;)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Remount /tmp without rebooting&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;mount &lt;span class="nt"&gt;-o&lt;/span&gt; remount /tmp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 4: Verify the new size&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;df&lt;/span&gt; &lt;span class="nt"&gt;-h&lt;/span&gt; /tmp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should now see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Filesystem   Size   Used   Avail   Use%   Mounted on
tmpfs        2.0G   4.8M   2.0G    1%     /tmp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz4zoaepnkb3qft9ytfbq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz4zoaepnkb3qft9ytfbq.png" alt=" " width="800" height="297"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/tmp&lt;/code&gt; is now &lt;strong&gt;2 GB&lt;/strong&gt; — well above Jenkins' 1 GB threshold! ✅&lt;/p&gt;




&lt;h2&gt;
  
  
  Fix Option 2 — Quick Jenkins Restart (Temporary Fix)
&lt;/h2&gt;

&lt;p&gt;If you just need Jenkins back online quickly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; /tmp/&lt;span class="k"&gt;*&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl restart jenkins
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This clears temp files and forces Jenkins to recheck the threshold. Sometimes this is enough to bring the node back online temporarily.&lt;/p&gt;




&lt;h2&gt;
  
  
  Bonus Fix — Add Swap Space (Important for t2.micro)
&lt;/h2&gt;

&lt;p&gt;On &lt;strong&gt;t2.micro&lt;/strong&gt; EC2 instances, swap is 0 B by default. Jenkins warns about this too. Here is how to create a 1 GB swap file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a 1GB swap file&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;fallocate &lt;span class="nt"&gt;-l&lt;/span&gt; 1G /swapfile

&lt;span class="c"&gt;# Set correct permissions&lt;/span&gt;
&lt;span class="nb"&gt;sudo chmod &lt;/span&gt;600 /swapfile

&lt;span class="c"&gt;# Set up swap space&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;mkswap /swapfile

&lt;span class="c"&gt;# Enable the swap&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;swapon /swapfile

&lt;span class="c"&gt;# Verify&lt;/span&gt;
free &lt;span class="nt"&gt;-h&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Result
&lt;/h2&gt;

&lt;p&gt;After applying Fix Option 1 and adding swap space the node came back online immediately!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1me8y5g6c5hftzqxxl7m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1me8y5g6c5hftzqxxl7m.png" alt=" " width="800" height="430"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Built-In Node came back &lt;strong&gt;Online&lt;/strong&gt; &lt;/li&gt;
&lt;li&gt;Jenkins disk space warning disappeared &lt;/li&gt;
&lt;li&gt;Builds started running immediately &lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;t2.micro has very limited resources&lt;/strong&gt; — always check &lt;code&gt;/tmp&lt;/code&gt; size when setting up Jenkins on a free tier EC2.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Never ignore Jenkins resource warnings&lt;/strong&gt; — they directly affect whether your node stays online.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix Option 1 is permanent&lt;/strong&gt;, Fix Option 2 is temporary. Always go with Option 1 for a stable setup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Always add swap on t2.micro&lt;/strong&gt; — it prevents a lot of memory-related Jenkins issues down the line.&lt;/p&gt;




&lt;p&gt;*Setting up Jenkins on EC2? Drop your questions in the comments *&lt;/p&gt;




</description>
      <category>jenkins</category>
      <category>devops</category>
      <category>aws</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Git Commands Every DevOps Engineer Must Know</title>
      <dc:creator>Mumtaz Jahan</dc:creator>
      <pubDate>Mon, 20 Apr 2026 03:23:56 +0000</pubDate>
      <link>https://dev.to/mumtaz2029/git-commands-every-devops-engineer-must-know-5c09</link>
      <guid>https://dev.to/mumtaz2029/git-commands-every-devops-engineer-must-know-5c09</guid>
      <description>&lt;h2&gt;
  
  
  Git Commands Every DevOps Engineer Must Know
&lt;/h2&gt;

&lt;p&gt;Git is not just a version control tool — it's your &lt;strong&gt;daily survival kit&lt;/strong&gt; as a DevOps engineer. Whether you're managing pipelines, fixing production issues, or collaborating with teams, these commands will save you every single day.&lt;/p&gt;

&lt;p&gt;Let's break it down section by section.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Initial Setup — Configure Git &amp;amp; Start Your Project
&lt;/h2&gt;

&lt;p&gt;Before anything else, set up Git properly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set your global username&lt;/span&gt;
git config &lt;span class="nt"&gt;--global&lt;/span&gt; user.name &lt;span class="s2"&gt;"Your Name"&lt;/span&gt;

&lt;span class="c"&gt;# Set your global email&lt;/span&gt;
git config &lt;span class="nt"&gt;--global&lt;/span&gt; user.email &lt;span class="s2"&gt;"your@email.com"&lt;/span&gt;

&lt;span class="c"&gt;# Initialize a new Git repository&lt;/span&gt;
git init

&lt;span class="c"&gt;# Clone an existing repository&lt;/span&gt;
git clone &amp;lt;repo-url&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tip:&lt;/strong&gt; Start every project the right way — proper config avoids identity issues in commits later.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Daily Git Flow — Your Everyday DevOps Cycle
&lt;/h2&gt;

&lt;p&gt;This is the cycle you'll repeat every single day:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check the status of your changes&lt;/span&gt;
git status

&lt;span class="c"&gt;# Stage all changes to be committed&lt;/span&gt;
git add &lt;span class="nb"&gt;.&lt;/span&gt;

&lt;span class="c"&gt;# Commit your changes with a message&lt;/span&gt;
git commit &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"your message here"&lt;/span&gt;

&lt;span class="c"&gt;# Push changes to remote repository&lt;/span&gt;
git push
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tip:&lt;/strong&gt; Practice it. Automate it. Master it.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Branching Strategy — Work Smarter in CI/CD &amp;amp; Teamwork
&lt;/h2&gt;

&lt;p&gt;Branching is essential for clean CI/CD pipelines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# List all branches in your repository&lt;/span&gt;
git branch

&lt;span class="c"&gt;# Create a new branch and switch to it&lt;/span&gt;
git checkout &lt;span class="nt"&gt;-b&lt;/span&gt; feature-branch

&lt;span class="c"&gt;# Switch to an existing branch&lt;/span&gt;
git switch branch-name

&lt;span class="c"&gt;# Merge changes from another branch&lt;/span&gt;
git merge branch-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tip:&lt;/strong&gt; Always keep your &lt;code&gt;main&lt;/code&gt; branch &lt;strong&gt;clean and stable&lt;/strong&gt;. Never push untested code directly to main.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Sync With Remote — Keep Your Local Repo Up to Date
&lt;/h2&gt;

&lt;p&gt;Always sync before you push:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Fetch and merge changes from remote to local&lt;/span&gt;
git pull

&lt;span class="c"&gt;# Fetch changes from remote (without merging)&lt;/span&gt;
git fetch

&lt;span class="c"&gt;# Show remote repositories and their URLs&lt;/span&gt;
git remote &lt;span class="nt"&gt;-v&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tip:&lt;/strong&gt; Always &lt;code&gt;git pull&lt;/code&gt; before you &lt;code&gt;git push&lt;/code&gt; — it helps you avoid conflicts and ensures successful deployments.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Debug Like a Pro — Inspect History, Changes &amp;amp; Code
&lt;/h2&gt;

&lt;p&gt;When something breaks in production, these are your best friends:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Show commit history&lt;/span&gt;
git log

&lt;span class="c"&gt;# View changes between working tree, staging or commits&lt;/span&gt;
git diff

&lt;span class="c"&gt;# Show details of a specific commit&lt;/span&gt;
git show &amp;lt;commit-id&amp;gt;

&lt;span class="c"&gt;# See who changed each line and why&lt;/span&gt;
git blame &amp;lt;file&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tip:&lt;/strong&gt; &lt;code&gt;git blame&lt;/code&gt; helps you find who broke production 👀 — use it wisely!&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Undo &amp;amp; Fix — Life Saver Commands
&lt;/h2&gt;

&lt;p&gt;Mistakes happen. Fix them like a pro:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Undo last commit, keep changes staged&lt;/span&gt;
git reset &lt;span class="nt"&gt;--soft&lt;/span&gt; HEAD~1

&lt;span class="c"&gt;# Undo last commit and discard all changes (Use with caution!)&lt;/span&gt;
git reset &lt;span class="nt"&gt;--hard&lt;/span&gt; HEAD~1

&lt;span class="c"&gt;# Safely undo changes by creating a new commit (great for shared repos)&lt;/span&gt;
git revert &amp;lt;commit-id&amp;gt;

&lt;span class="c"&gt;# Temporarily save changes and clean your working directory&lt;/span&gt;
git stash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tip:&lt;/strong&gt; Use &lt;code&gt;git revert&lt;/code&gt; over &lt;code&gt;git reset --hard&lt;/code&gt; in shared/production repos — it's safer and keeps history clean.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. Bonus — Extra Power Commands
&lt;/h2&gt;

&lt;p&gt;Level up your Git game:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a tag for a specific version&lt;/span&gt;
git tag &lt;span class="nt"&gt;-a&lt;/span&gt; v1.0 &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"version 1.0"&lt;/span&gt;

&lt;span class="c"&gt;# Push a specific tag to remote&lt;/span&gt;
git push origin v1.0

&lt;span class="c"&gt;# Apply a specific commit from another branch&lt;/span&gt;
git cherry-pick &amp;lt;commit-id&amp;gt;

&lt;span class="c"&gt;# Check repository integrity and find issues&lt;/span&gt;
git fsck &lt;span class="nt"&gt;--full&lt;/span&gt;

&lt;span class="c"&gt;# Remove untracked files and directories&lt;/span&gt;
git clean &lt;span class="nt"&gt;-fd&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tip:&lt;/strong&gt; Master these commands and you're ready to tackle any DevOps workflow with confidence!&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Reference Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;git status&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Check current changes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;git pull&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Sync from remote&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;git stash&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Save work temporarily&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;git revert&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Safely undo in shared repos&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;git blame&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Find who changed what&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;git cherry-pick&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Apply specific commits&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;git tag&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Version your releases&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;Git is the backbone of every DevOps workflow. The engineers who know these commands deeply don't just push code — they &lt;strong&gt;ship with confidence&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Save this post for your next deployment! &lt;/p&gt;




&lt;p&gt;*Which Git command has saved you the most in production? Drop it in the comments *&lt;/p&gt;




</description>
      <category>git</category>
      <category>productivity</category>
      <category>devops</category>
      <category>beginners</category>
    </item>
    <item>
      <title>DevOps Scenario Interview Question: Deployment Failed in Production</title>
      <dc:creator>Mumtaz Jahan</dc:creator>
      <pubDate>Sat, 18 Apr 2026 14:11:02 +0000</pubDate>
      <link>https://dev.to/mumtaz2029/devops-scenario-interview-question-deployment-failed-in-production-2bii</link>
      <guid>https://dev.to/mumtaz2029/devops-scenario-interview-question-deployment-failed-in-production-2bii</guid>
      <description>&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;devops&lt;/code&gt; &lt;code&gt;kubernetes&lt;/code&gt; &lt;code&gt;cicd&lt;/code&gt; &lt;code&gt;career&lt;/code&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Scenario: Your Deployment Failed in Production. What Steps Will You Take?
&lt;/h2&gt;

&lt;p&gt;This is one of the most common &lt;strong&gt;real-world scenario questions&lt;/strong&gt; asked in DevOps interviews. Interviewers don't want textbook answers — they want to know how you think under pressure.&lt;/p&gt;

&lt;p&gt;Here's the complete answer framework.&lt;/p&gt;




&lt;h2&gt;
  
  
  Answer: Step-by-Step Approach
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Check CI/CD Pipeline Logs
&lt;/h3&gt;

&lt;p&gt;First thing — don't guess, &lt;strong&gt;read the logs&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# For Jenkins&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; /var/log/jenkins/jenkins.log

&lt;span class="c"&gt;# For GitHub Actions — check the Actions tab in your repo&lt;/span&gt;

&lt;span class="c"&gt;# For GitLab CI&lt;/span&gt;
gitlab-ci logs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pipeline log tells you exactly &lt;strong&gt;where&lt;/strong&gt; it broke.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Identify the Failed Stage (Build / Test / Deploy)
&lt;/h3&gt;

&lt;p&gt;Every pipeline has stages. Narrow it down:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Build failed?&lt;/strong&gt; → Dependency issue, Dockerfile error, compilation error&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test failed?&lt;/strong&gt; → A test caught a regression before it hit production&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploy failed?&lt;/strong&gt; → Kubernetes issue, wrong image tag, resource limits, misconfigured secrets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Knowing the stage cuts your debugging time in half.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Verify Configuration Changes
&lt;/h3&gt;

&lt;p&gt;Check what changed before the failure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check recent git commits&lt;/span&gt;
git log &lt;span class="nt"&gt;--oneline&lt;/span&gt; &lt;span class="nt"&gt;-10&lt;/span&gt;

&lt;span class="c"&gt;# Check Kubernetes config changes&lt;/span&gt;
kubectl describe deployment my-app

&lt;span class="c"&gt;# Check if secrets/configmaps were updated&lt;/span&gt;
kubectl get configmap my-app-config &lt;span class="nt"&gt;-o&lt;/span&gt; yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Most production failures trace back to a &lt;strong&gt;config change&lt;/strong&gt; someone forgot to mention.&lt;/p&gt;




&lt;h3&gt;
  
  
  4. Rollback to Previous Stable Version
&lt;/h3&gt;

&lt;p&gt;Don't try to fix forward when production is down. &lt;strong&gt;Rollback first, fix later.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Kubernetes rollback&lt;/span&gt;
kubectl rollout undo deployment/my-app

&lt;span class="c"&gt;# Verify rollback status&lt;/span&gt;
kubectl rollout status deployment/my-app

&lt;span class="c"&gt;# Check rollout history&lt;/span&gt;
kubectl rollout &lt;span class="nb"&gt;history &lt;/span&gt;deployment/my-app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This restores service immediately while you investigate the root cause safely.&lt;/p&gt;




&lt;h3&gt;
  
  
  5. Fix the Issue and Redeploy
&lt;/h3&gt;

&lt;p&gt;Once production is stable:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reproduce the issue in staging&lt;/li&gt;
&lt;li&gt;Apply the fix&lt;/li&gt;
&lt;li&gt;Test thoroughly&lt;/li&gt;
&lt;li&gt;Redeploy with the corrected version
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl &lt;span class="nb"&gt;set &lt;/span&gt;image deployment/my-app my-app&lt;span class="o"&gt;=&lt;/span&gt;my-image:v2.1-fixed
kubectl rollout status deployment/my-app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Pro Tip
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Always maintain versioned Docker images&lt;/strong&gt; — never use &lt;code&gt;latest&lt;/code&gt; in production.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Bad&lt;/span&gt;
&lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app:latest&lt;/span&gt;

&lt;span class="c1"&gt;# Good&lt;/span&gt;
&lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app:v2.0.1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without versioned images, you can't rollback. Tag every release.&lt;/p&gt;




&lt;h2&gt;
  
  
  Bonus: What Interviewers Are Really Looking For
&lt;/h2&gt;

&lt;p&gt;They want to see that you: don't panic, prioritize restoring service over finding blame, think in structured steps, and know the actual commands — not just theory.&lt;/p&gt;




&lt;p&gt;*Preparing for a DevOps interview? Drop your toughest scenario question in the comments *&lt;/p&gt;




</description>
      <category>career</category>
      <category>cicd</category>
      <category>devops</category>
      <category>interview</category>
    </item>
    <item>
      <title>How to Fixed a Kubernetes CrashLoopBackOff in Production</title>
      <dc:creator>Mumtaz Jahan</dc:creator>
      <pubDate>Fri, 17 Apr 2026 23:55:42 +0000</pubDate>
      <link>https://dev.to/mumtaz2029/how-to-fixed-a-kubernetes-crashloopbackoff-in-production-232f</link>
      <guid>https://dev.to/mumtaz2029/how-to-fixed-a-kubernetes-crashloopbackoff-in-production-232f</guid>
      <description>&lt;p&gt;&lt;em&gt;Tags:&lt;/em&gt; &lt;code&gt;kubernetes&lt;/code&gt; &lt;code&gt;devops&lt;/code&gt; &lt;code&gt;debugging&lt;/code&gt; &lt;code&gt;cloud&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem: Application Was DOWN in Kubernetes
&lt;/h2&gt;

&lt;p&gt;One of the most stressful moments in DevOps — you check your monitoring dashboard and your application is completely &lt;strong&gt;DOWN&lt;/strong&gt; in Kubernetes. No graceful degradation. Just... down.&lt;/p&gt;

&lt;p&gt;Here's exactly how I diagnosed and fixed it in under an hour.&lt;/p&gt;




&lt;h2&gt;
  
  
  Issue Found: Pod Was in CrashLoopBackOff
&lt;/h2&gt;

&lt;p&gt;Running &lt;code&gt;kubectl get pods&lt;/code&gt; revealed the culprit immediately:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;NAME                        READY   STATUS             RESTARTS   AGE
my-app-7d9f8b6c4-xk2pq     0/1     CrashLoopBackOff   8          20m
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;CrashLoopBackOff&lt;/code&gt; means Kubernetes is repeatedly trying to start your container, it crashes, and Kubernetes backs off with increasing wait times before retrying. Something inside the container was causing it to exit immediately on startup.&lt;/p&gt;




&lt;h2&gt;
  
  
  Debug Steps
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Checked Logs (&lt;code&gt;kubectl logs&lt;/code&gt;)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl logs my-app-7d9f8b6c4-xk2pq &lt;span class="nt"&gt;--previous&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--previous&lt;/code&gt; flag is crucial here — it lets you see logs from the &lt;em&gt;crashed&lt;/em&gt; container, not the current (possibly empty) one. The logs showed repeated connection errors on startup.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Checked Config
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl describe pod my-app-7d9f8b6c4-xk2pq
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I inspected the environment variables and ConfigMaps attached to the pod. The &lt;code&gt;describe&lt;/code&gt; command is a goldmine — it shows events, resource limits, volume mounts, and more.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Found DB Connection Issue
&lt;/h3&gt;

&lt;p&gt;The logs made it clear: the app was trying to connect to the database using an &lt;strong&gt;incorrect connection string&lt;/strong&gt;. The host value in the environment variable was pointing to a stale endpoint. The app would crash immediately on boot since it couldn't reach the DB.&lt;/p&gt;




&lt;h2&gt;
  
  
  Fix Applied
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Corrected Environment Variables
&lt;/h3&gt;

&lt;p&gt;Updated the Kubernetes secret/configmap with the correct database host:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl edit secret my-app-db-secret
&lt;span class="c"&gt;# or&lt;/span&gt;
kubectl &lt;span class="nb"&gt;set env &lt;/span&gt;deployment/my-app &lt;span class="nv"&gt;DB_HOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;correct-db-host.internal
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Restarted the Deployment
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl rollout restart deployment/my-app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then watched the rollout:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl rollout status deployment/my-app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🎉 Result
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; Application UP
 Issue resolved
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pods came up healthy, readiness probes passed, and traffic started flowing again.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Always check logs with &lt;code&gt;--previous&lt;/code&gt;&lt;/strong&gt; — the live container may have no logs if it crashes before writing any.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;kubectl describe pod&lt;/code&gt;&lt;/strong&gt; is your best friend for seeing the full picture: events, env vars, resource pressure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CrashLoopBackOff is almost always one of:&lt;/strong&gt; bad env vars/secrets, missing config, OOM kill, or a bug triggered at startup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;kubectl rollout restart&lt;/code&gt;&lt;/strong&gt; is safer than deleting pods manually — it does a rolling restart with zero downtime.&lt;/p&gt;




&lt;p&gt;*Hit a similar issue? Drop your debugging story in the comments *&lt;/p&gt;




</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>sre</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
