<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: IT Defined</title>
    <description>The latest articles on DEV Community by IT Defined (@it_defined_9fa44164c67442).</description>
    <link>https://dev.to/it_defined_9fa44164c67442</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3905931%2F084c6ea8-3136-4128-bc2e-66f4cf4503f2.png</url>
      <title>DEV Community: IT Defined</title>
      <link>https://dev.to/it_defined_9fa44164c67442</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/it_defined_9fa44164c67442"/>
    <language>en</language>
    <item>
      <title>How I Stopped Chasing Raises and Started Chasing Domain Shifts (DevOps in 2026)</title>
      <dc:creator>IT Defined</dc:creator>
      <pubDate>Mon, 11 May 2026 09:52:26 +0000</pubDate>
      <link>https://dev.to/it_defined_9fa44164c67442/how-i-stopped-chasing-raises-and-started-chasing-domain-shifts-devops-in-2026-1kd7</link>
      <guid>https://dev.to/it_defined_9fa44164c67442/how-i-stopped-chasing-raises-and-started-chasing-domain-shifts-devops-in-2026-1kd7</guid>
      <description>&lt;p&gt;Doubling your salary sounds like clickbait.&lt;/p&gt;

&lt;p&gt;I get it. I'd scroll past that headline too.&lt;/p&gt;

&lt;p&gt;But hear me out — because in DevOps in 2026, it's actually one of the more realistic career moves you can make in tech. And I want to break down &lt;em&gt;why&lt;/em&gt; without the usual hype.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Reason Most IT Salaries Plateau
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1554224155-6726b3ff858f%3Fw%3D1200%26auto%3Dformat%26fit%3Dcrop%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1554224155-6726b3ff858f%3Fw%3D1200%26auto%3Dformat%26fit%3Dcrop%26q%3D80" alt="Salary Plateau" width="1200" height="689"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Photo by &lt;a href="https://unsplash.com" rel="noopener noreferrer"&gt;Unsplash&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It's not that your work isn't valuable.&lt;/p&gt;

&lt;p&gt;It's supply and demand — and in most traditional IT roles, supply is high. When a lot of people can do what you do, salaries flatten. Simple as that.&lt;/p&gt;

&lt;p&gt;I've seen brilliant support engineers, manual testers, and sysadmins stuck at the same band for 3+ years — not because they weren't good, but because they were competing in an overcrowded pool.&lt;/p&gt;

&lt;p&gt;The harsh reality? Being good at your job isn't enough if hundreds of others are equally good at the same job.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why DevOps Is Still a Different Story in 2026
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1451187580459-43490279c0fa%3Fw%3D1200%26auto%3Dformat%26fit%3Dcrop%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1451187580459-43490279c0fa%3Fw%3D1200%26auto%3Dformat%26fit%3Dcrop%26q%3D80" alt="Cloud Infrastructure" width="1200" height="798"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Photo by &lt;a href="https://unsplash.com" rel="noopener noreferrer"&gt;Unsplash&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The demand for engineers who can genuinely bridge dev and ops — own infrastructure, automate deployments, manage cloud costs, &lt;em&gt;and&lt;/em&gt; speak the language of developers — is still ahead of supply.&lt;/p&gt;

&lt;p&gt;Companies aren't just hiring for this. They're &lt;strong&gt;struggling&lt;/strong&gt; to hire for this.&lt;/p&gt;

&lt;p&gt;That gap is where the salary jump lives.&lt;/p&gt;

&lt;p&gt;Here's what companies are desperate for right now:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Engineers who understand &lt;strong&gt;CI/CD end to end&lt;/strong&gt; — not just one tool&lt;/li&gt;
&lt;li&gt;People who can write &lt;strong&gt;infrastructure as code&lt;/strong&gt;, not just click through cloud consoles&lt;/li&gt;
&lt;li&gt;Folks who understand &lt;strong&gt;cloud cost ownership&lt;/strong&gt; — FinOps is becoming a real skill gap&lt;/li&gt;
&lt;li&gt;Engineers who can &lt;strong&gt;improve developer experience&lt;/strong&gt; at scale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Find someone who does all four confidently? Companies will pay well above market to keep them.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Actually Works (From People Who Made the Shift)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1516321318423-f06f85e504b3%3Fw%3D1200%26auto%3Dformat%26fit%3Dcrop%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1516321318423-f06f85e504b3%3Fw%3D1200%26auto%3Dformat%26fit%3Dcrop%26q%3D80" alt="Learning and Growth" width="1200" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Photo by &lt;a href="https://unsplash.com" rel="noopener noreferrer"&gt;Unsplash&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I've watched manual testers, IT ops folks, and support engineers successfully transition into DevOps. It took most of them &lt;strong&gt;12 to 24 months&lt;/strong&gt; of focused effort — not full-time bootcamps, just consistent learning alongside their current jobs.&lt;/p&gt;

&lt;p&gt;What moved the needle wasn't certifications alone. It was:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Building something real&lt;/strong&gt; — even a personal project on AWS or a home lab with Docker and Kubernetes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Being able to talk about it&lt;/strong&gt; confidently in interviews&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Owning the narrative&lt;/strong&gt; — not &lt;em&gt;"I'm learning DevOps"&lt;/em&gt; but &lt;em&gt;"I built and deployed X using Y"&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That shift in how you present yourself changes everything.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"I didn't become 2x smarter. I just became relevant to a different set of problems."&lt;/em&gt;&lt;br&gt;
— A friend who went from ₹6 LPA to ₹14 LPA in 18 months&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Roadmap That Actually Works in 2026
&lt;/h2&gt;

&lt;p&gt;Here's a rough learning path based on what's actually getting people hired:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Month 1–3: Foundations&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Linux fundamentals (if not already solid)&lt;/li&gt;
&lt;li&gt;Git and version control workflows&lt;/li&gt;
&lt;li&gt;Basic networking concepts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Month 4–6: Cloud + Containers&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS or Azure fundamentals (get certified)&lt;/li&gt;
&lt;li&gt;Docker — build, run, push real containers&lt;/li&gt;
&lt;li&gt;Write your first Dockerfile for a real project&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Month 7–12: The Real Stuff&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kubernetes basics — deploy something real&lt;/li&gt;
&lt;li&gt;CI/CD pipelines — GitHub Actions or GitLab CI&lt;/li&gt;
&lt;li&gt;Infrastructure as Code — Terraform basics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Month 12–18: Differentiate&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Build a portfolio project end to end&lt;/li&gt;
&lt;li&gt;Contribute to open source (even small fixes count)&lt;/li&gt;
&lt;li&gt;Start talking about it — blog posts, LinkedIn, Quora&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Honest Part
&lt;/h2&gt;

&lt;p&gt;The salary jump happens because you stop competing in a crowded pool and start competing in a smaller, more specialized one.&lt;/p&gt;

&lt;p&gt;That's the whole strategy. No secret sauce.&lt;/p&gt;

&lt;p&gt;2026 is still a good window — the market hasn't fully normalized yet. But it will, eventually.&lt;/p&gt;

&lt;p&gt;The best time to start was probably a year ago.&lt;br&gt;
The second best time is right now.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Recap
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What Most People Do&lt;/th&gt;
&lt;th&gt;What Works&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Wait for appraisal cycles&lt;/td&gt;
&lt;td&gt;Shift domains strategically&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Collect certifications passively&lt;/td&gt;
&lt;td&gt;Build real projects actively&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Apply to the same roles&lt;/td&gt;
&lt;td&gt;Target the specialized pool&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"I'm learning DevOps"&lt;/td&gt;
&lt;td&gt;"I built and deployed X"&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;&lt;em&gt;Have you made a domain shift that changed your salary trajectory? Drop it in the comments — genuinely curious what worked for people.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;#devops&lt;/code&gt; &lt;code&gt;#career&lt;/code&gt; &lt;code&gt;#cloudcomputing&lt;/code&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>career</category>
      <category>cloud</category>
      <category>salary</category>
    </item>
    <item>
      <title>Terraform vs AWS CloudFormation in 2026: Which One Should You Actually Learn?</title>
      <dc:creator>IT Defined</dc:creator>
      <pubDate>Mon, 04 May 2026 13:00:03 +0000</pubDate>
      <link>https://dev.to/it_defined_9fa44164c67442/terraform-vs-aws-cloudformation-in-2026-which-one-should-you-actually-learn-2oa</link>
      <guid>https://dev.to/it_defined_9fa44164c67442/terraform-vs-aws-cloudformation-in-2026-which-one-should-you-actually-learn-2oa</guid>
      <description>&lt;h2&gt;
  
  
  The short answer (because I know you're scrolling)
&lt;/h2&gt;

&lt;p&gt;Learn Terraform. If you're a beginner picking your first IaC tool in 2026, learn Terraform. CloudFormation is fine, it's not bad, but the job market and the ecosystem have decided. &lt;strong&gt;Terraform won.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you're already at a company using CloudFormation, you don't need to migrate. Stay where you are. Both tools do the same job, just differently.&lt;/p&gt;

&lt;p&gt;The rest of this post is the long version with the receipts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Syntax — HCL vs JSON/YAML
&lt;/h2&gt;

&lt;p&gt;Here's a Terraform resource for an S3 bucket:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_s3_bucket"&lt;/span&gt; &lt;span class="s2"&gt;"logs"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;bucket&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"my-app-logs"&lt;/span&gt;
  &lt;span class="nx"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;Environment&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"prod"&lt;/span&gt;
    &lt;span class="nx"&gt;Owner&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"devops-team"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same thing in CloudFormation YAML:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;LogsBucket&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::S3::Bucket&lt;/span&gt;
    &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;BucketName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app-logs&lt;/span&gt;
      &lt;span class="na"&gt;Tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Environment&lt;/span&gt;
          &lt;span class="na"&gt;Value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;prod&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Owner&lt;/span&gt;
          &lt;span class="na"&gt;Value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;devops-team&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both are fine. HCL is friendlier once you're used to it. CloudFormation also supports JSON, but please don't — trailing commas, no comments, every developer maintaining it will hate you.&lt;/p&gt;

&lt;h2&gt;
  
  
  State management — where Terraform gets tricky
&lt;/h2&gt;

&lt;p&gt;CloudFormation manages state for you. AWS knows what it created. There's no state file. Delete a stack, AWS deletes the resources. Simple.&lt;/p&gt;

&lt;p&gt;Terraform keeps a state file. Local JSON file by default. In production, you put it in S3 with state locking via DynamoDB. Lose this file, Terraform forgets your infrastructure exists, even though AWS still has the resources.&lt;/p&gt;

&lt;p&gt;Sounds bad. It is bad, the first time you mess it up. But there's a tradeoff — Terraform's state model is what makes it portable across clouds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Practical advice:&lt;/strong&gt; if you go with Terraform, set up remote state in S3 with DynamoDB locking from day one. Never use local state for anything beyond personal experiments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-cloud — the most misunderstood argument
&lt;/h2&gt;

&lt;p&gt;People say "learn Terraform because it's multi-cloud." Then they only ever use it for AWS.&lt;/p&gt;

&lt;p&gt;Real talk: most Bangalore companies are AWS-only. Or AWS plus a tiny bit of GCP for BigQuery. The multi-cloud dream is real for a small number of large enterprises.&lt;/p&gt;

&lt;p&gt;But here's the thing — even if you're AWS-only, Terraform's provider model means you can also manage GitHub repos, Cloudflare DNS, Datadog dashboards, PagerDuty schedules, and a hundred other things in the same codebase. CloudFormation can't.&lt;/p&gt;

&lt;p&gt;That's the actual advantage in practice — not multi-cloud, but &lt;strong&gt;multi-vendor&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the job market actually says
&lt;/h2&gt;

&lt;p&gt;I run a training institute, so I track what JD requirements look like. As of early 2026, looking at LinkedIn and Naukri postings for Bangalore DevOps roles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Terraform mentioned: ~75-80% of postings&lt;/li&gt;
&lt;li&gt;CloudFormation mentioned: ~30-35%&lt;/li&gt;
&lt;li&gt;Pulumi or CDK mentioned: &amp;lt;5%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most jobs that mention CloudFormation also mention Terraform. &lt;strong&gt;Few jobs require only CloudFormation.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  When CloudFormation actually wins
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Pure AWS shops with strong AWS Organizations/Control Tower setups&lt;/li&gt;
&lt;li&gt;Compliance-heavy environments where you want AWS-native audit trails&lt;/li&gt;
&lt;li&gt;Teams already invested in AWS CDK&lt;/li&gt;
&lt;li&gt;Lambda-heavy serverless apps — SAM is genuinely simpler&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What we teach
&lt;/h2&gt;

&lt;p&gt;Honestly? We teach both, but we go deeper on Terraform. Roughly 70/30 split.&lt;/p&gt;

&lt;p&gt;If you only have time for one — go Terraform.&lt;/p&gt;

&lt;p&gt;Full version with module patterns, state strategies, and 8 common interview questions on &lt;a href="https://itdefined.org/blogs/terraform-vs-cloudformation-2026" rel="noopener noreferrer"&gt;itdefined.org&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>career</category>
      <category>devops</category>
      <category>terraform</category>
    </item>
    <item>
      <title>Kubernetes Troubleshooting</title>
      <dc:creator>IT Defined</dc:creator>
      <pubDate>Thu, 30 Apr 2026 10:43:31 +0000</pubDate>
      <link>https://dev.to/it_defined_9fa44164c67442/kubernetes-troubleshooting-2l9j</link>
      <guid>https://dev.to/it_defined_9fa44164c67442/kubernetes-troubleshooting-2l9j</guid>
      <description>&lt;h2&gt;
  
  
  Why this exists
&lt;/h2&gt;

&lt;p&gt;I've been running K8s troubleshooting workshops for two years. We have a 200-student program at IT Defined where we throw broken clusters at people. Patterns emerged.&lt;/p&gt;

&lt;p&gt;Most failures aren't novel. The same 25-30 failure modes account for 90% of real-world K8s incidents. If you can confidently debug these, you'll handle most production incidents.&lt;/p&gt;

&lt;p&gt;Here are the 10 most critical scenarios. Full 26 in the linked post.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. CrashLoopBackOff
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; Pod restart count climbing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Diagnosis:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl describe pod POD_NAME
kubectl logs POD_NAME &lt;span class="nt"&gt;--previous&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Likely causes:&lt;/strong&gt; App crashes on startup (config error, missing env var, can't connect to DB), liveness probe too aggressive, command/args misconfigured.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Read the previous container's logs. Reason is usually right there. If logs are empty, the container died before logging — check the entrypoint, command, and args.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. ImagePullBackOff or ErrImagePull
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Diagnosis:&lt;/strong&gt; &lt;code&gt;kubectl describe pod&lt;/code&gt;, look at events at the bottom.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Likely causes:&lt;/strong&gt; Image name typo, image doesn't exist, registry credentials missing, wrong region (ECR is regional), node IAM role can't pull from ECR.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Run &lt;code&gt;docker pull&lt;/code&gt; manually from a workstation. If it works, it's a node permission issue.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Pod stuck Pending
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Diagnosis:&lt;/strong&gt; &lt;code&gt;kubectl describe pod&lt;/code&gt;. Look for "0/3 nodes available: insufficient cpu" or "didn't match node selector."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Likely causes:&lt;/strong&gt; Insufficient capacity, resource requests too high, taints/tolerations mismatch, PVC not bound.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Check &lt;code&gt;kubectl describe nodes&lt;/code&gt; for available resources. If maxed, autoscale.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. OOMKilled
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Diagnosis:&lt;/strong&gt; &lt;code&gt;kubectl describe pod&lt;/code&gt; shows "Last State: Terminated, Reason: OOMKilled."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Likely causes:&lt;/strong&gt; Container exceeded memory limit, JVM not configured for container limits, memory leak.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Increase limits if workload genuinely needs more. For Java apps, use &lt;code&gt;-XX:MaxRAMPercentage&lt;/code&gt; properly.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Service unreachable
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Diagnosis:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get endpoints SVC_NAME
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Likely causes:&lt;/strong&gt; No endpoints (selector doesn't match pod labels), pod not listening on expected port, NetworkPolicy blocking traffic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; 99% of the time it's a label selector mismatch.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. DNS resolution failing
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Diagnosis:&lt;/strong&gt; &lt;code&gt;kubectl exec&lt;/code&gt; into pod, run nslookup. Check CoreDNS pods.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Likely causes:&lt;/strong&gt; CoreDNS pods crashed, NetworkPolicy blocking DNS, /etc/resolv.conf misconfigured.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Restart CoreDNS if misbehaving. On EKS, defaults are sometimes too low for busy clusters.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Ingress 502 Bad Gateway
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Likely causes:&lt;/strong&gt; Backend pod down, target group health check failing, port mismatch, slow startup so ALB marks unhealthy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Check target group health in AWS console. Fix readiness probe if pods unhealthy.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. PVC stuck Pending
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Likely causes:&lt;/strong&gt; No StorageClass set, EBS CSI driver not installed, IAM permissions for the driver.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix on EKS:&lt;/strong&gt; Install EBS CSI driver as an EKS add-on. Service account needs the right IAM role via IRSA.&lt;/p&gt;

&lt;h2&gt;
  
  
  9. Node Not Ready
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Likely causes:&lt;/strong&gt; Kubelet crashed, container runtime issue, disk pressure, network plugin failure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; SSH to node (or SSM Session Manager). Check &lt;code&gt;journalctl -u kubelet&lt;/code&gt;. Often it's disk full from log accumulation.&lt;/p&gt;

&lt;h2&gt;
  
  
  10. HPA not scaling
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Likely causes:&lt;/strong&gt; Metrics-server not installed, HPA targeting CPU but pod has no CPU requests, max replicas reached.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; &lt;code&gt;kubectl get hpa&lt;/code&gt;. If &lt;code&gt;&amp;lt;unknown&amp;gt;&lt;/code&gt; appears under metrics, metrics-server is broken.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to use this playbook
&lt;/h2&gt;

&lt;p&gt;When you hit a real incident, search for keywords from the symptom. Most day-to-day stuff is covered.&lt;/p&gt;

&lt;p&gt;If you want to actually practice these in a safe environment, our K8s troubleshooting labs at IT Defined are exactly this — broken clusters with planted issues, fix them under time pressure.&lt;/p&gt;

&lt;p&gt;Full 26 scenarios — including ConfigMap updates, Secret rotation, NetworkPolicy issues, PDB blocks, autoscaler problems, kube-proxy/CNI issues, Job failures, IRSA problems, webhook admission controllers, liveness probes, PV cleanup, and cluster upgrades — on &lt;a href="https://itdefined.org/blogs" rel="noopener noreferrer"&gt;itdefined.org&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
