DEV Community: IT Defined

How I Stopped Chasing Raises and Started Chasing Domain Shifts (DevOps in 2026)

IT Defined — Mon, 11 May 2026 09:52:26 +0000

Doubling your salary sounds like clickbait.

I get it. I'd scroll past that headline too.

But hear me out — because in DevOps in 2026, it's actually one of the more realistic career moves you can make in tech. And I want to break down why without the usual hype.

The Real Reason Most IT Salaries Plateau

Photo by Unsplash

It's not that your work isn't valuable.

It's supply and demand — and in most traditional IT roles, supply is high. When a lot of people can do what you do, salaries flatten. Simple as that.

I've seen brilliant support engineers, manual testers, and sysadmins stuck at the same band for 3+ years — not because they weren't good, but because they were competing in an overcrowded pool.

The harsh reality? Being good at your job isn't enough if hundreds of others are equally good at the same job.

Why DevOps Is Still a Different Story in 2026

Photo by Unsplash

The demand for engineers who can genuinely bridge dev and ops — own infrastructure, automate deployments, manage cloud costs, and speak the language of developers — is still ahead of supply.

Companies aren't just hiring for this. They're struggling to hire for this.

That gap is where the salary jump lives.

Here's what companies are desperate for right now:

Engineers who understand CI/CD end to end — not just one tool
People who can write infrastructure as code, not just click through cloud consoles
Folks who understand cloud cost ownership — FinOps is becoming a real skill gap
Engineers who can improve developer experience at scale

Find someone who does all four confidently? Companies will pay well above market to keep them.

What Actually Works (From People Who Made the Shift)

Photo by Unsplash

I've watched manual testers, IT ops folks, and support engineers successfully transition into DevOps. It took most of them 12 to 24 months of focused effort — not full-time bootcamps, just consistent learning alongside their current jobs.

What moved the needle wasn't certifications alone. It was:

Building something real — even a personal project on AWS or a home lab with Docker and Kubernetes
Being able to talk about it confidently in interviews
Owning the narrative — not "I'm learning DevOps" but "I built and deployed X using Y"

That shift in how you present yourself changes everything.

"I didn't become 2x smarter. I just became relevant to a different set of problems."
— A friend who went from ₹6 LPA to ₹14 LPA in 18 months

The Roadmap That Actually Works in 2026

Here's a rough learning path based on what's actually getting people hired:

Month 1–3: Foundations

Linux fundamentals (if not already solid)
Git and version control workflows
Basic networking concepts

Month 4–6: Cloud + Containers

AWS or Azure fundamentals (get certified)
Docker — build, run, push real containers
Write your first Dockerfile for a real project

Month 7–12: The Real Stuff

Kubernetes basics — deploy something real
CI/CD pipelines — GitHub Actions or GitLab CI
Infrastructure as Code — Terraform basics

Month 12–18: Differentiate

Build a portfolio project end to end
Contribute to open source (even small fixes count)
Start talking about it — blog posts, LinkedIn, Quora

The Honest Part

The salary jump happens because you stop competing in a crowded pool and start competing in a smaller, more specialized one.

That's the whole strategy. No secret sauce.

2026 is still a good window — the market hasn't fully normalized yet. But it will, eventually.

The best time to start was probably a year ago.
The second best time is right now.

Quick Recap

What Most People Do	What Works
Wait for appraisal cycles	Shift domains strategically
Collect certifications passively	Build real projects actively
Apply to the same roles	Target the specialized pool
"I'm learning DevOps"	"I built and deployed X"

Have you made a domain shift that changed your salary trajectory? Drop it in the comments — genuinely curious what worked for people.

Tags: #devops #career #cloudcomputing

Terraform vs AWS CloudFormation in 2026: Which One Should You Actually Learn?

IT Defined — Mon, 04 May 2026 13:00:03 +0000

The short answer (because I know you're scrolling)

Learn Terraform. If you're a beginner picking your first IaC tool in 2026, learn Terraform. CloudFormation is fine, it's not bad, but the job market and the ecosystem have decided. Terraform won.

If you're already at a company using CloudFormation, you don't need to migrate. Stay where you are. Both tools do the same job, just differently.

The rest of this post is the long version with the receipts.

Syntax — HCL vs JSON/YAML

Here's a Terraform resource for an S3 bucket:

resource "aws_s3_bucket" "logs" {
  bucket = "my-app-logs"
  tags = {
    Environment = "prod"
    Owner       = "devops-team"
  }
}

Same thing in CloudFormation YAML:

Resources:
  LogsBucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: my-app-logs
      Tags:
        - Key: Environment
          Value: prod
        - Key: Owner
          Value: devops-team

Both are fine. HCL is friendlier once you're used to it. CloudFormation also supports JSON, but please don't — trailing commas, no comments, every developer maintaining it will hate you.

State management — where Terraform gets tricky

CloudFormation manages state for you. AWS knows what it created. There's no state file. Delete a stack, AWS deletes the resources. Simple.

Terraform keeps a state file. Local JSON file by default. In production, you put it in S3 with state locking via DynamoDB. Lose this file, Terraform forgets your infrastructure exists, even though AWS still has the resources.

Sounds bad. It is bad, the first time you mess it up. But there's a tradeoff — Terraform's state model is what makes it portable across clouds.

Practical advice: if you go with Terraform, set up remote state in S3 with DynamoDB locking from day one. Never use local state for anything beyond personal experiments.

Multi-cloud — the most misunderstood argument

People say "learn Terraform because it's multi-cloud." Then they only ever use it for AWS.

Real talk: most Bangalore companies are AWS-only. Or AWS plus a tiny bit of GCP for BigQuery. The multi-cloud dream is real for a small number of large enterprises.

But here's the thing — even if you're AWS-only, Terraform's provider model means you can also manage GitHub repos, Cloudflare DNS, Datadog dashboards, PagerDuty schedules, and a hundred other things in the same codebase. CloudFormation can't.

That's the actual advantage in practice — not multi-cloud, but multi-vendor.

What the job market actually says

I run a training institute, so I track what JD requirements look like. As of early 2026, looking at LinkedIn and Naukri postings for Bangalore DevOps roles:

Terraform mentioned: ~75-80% of postings
CloudFormation mentioned: ~30-35%
Pulumi or CDK mentioned: <5%

Most jobs that mention CloudFormation also mention Terraform. Few jobs require only CloudFormation.

When CloudFormation actually wins

Pure AWS shops with strong AWS Organizations/Control Tower setups
Compliance-heavy environments where you want AWS-native audit trails
Teams already invested in AWS CDK
Lambda-heavy serverless apps — SAM is genuinely simpler

What we teach

Honestly? We teach both, but we go deeper on Terraform. Roughly 70/30 split.

If you only have time for one — go Terraform.

Full version with module patterns, state strategies, and 8 common interview questions on itdefined.org.

Kubernetes Troubleshooting

IT Defined — Thu, 30 Apr 2026 10:43:31 +0000

Why this exists

I've been running K8s troubleshooting workshops for two years. We have a 200-student program at IT Defined where we throw broken clusters at people. Patterns emerged.

Most failures aren't novel. The same 25-30 failure modes account for 90% of real-world K8s incidents. If you can confidently debug these, you'll handle most production incidents.

Here are the 10 most critical scenarios. Full 26 in the linked post.

1. CrashLoopBackOff

Symptom: Pod restart count climbing.

Diagnosis:

kubectl describe pod POD_NAME
kubectl logs POD_NAME --previous

Likely causes: App crashes on startup (config error, missing env var, can't connect to DB), liveness probe too aggressive, command/args misconfigured.

Fix: Read the previous container's logs. Reason is usually right there. If logs are empty, the container died before logging — check the entrypoint, command, and args.

2. ImagePullBackOff or ErrImagePull

Diagnosis: kubectl describe pod, look at events at the bottom.

Likely causes: Image name typo, image doesn't exist, registry credentials missing, wrong region (ECR is regional), node IAM role can't pull from ECR.

Fix: Run docker pull manually from a workstation. If it works, it's a node permission issue.

3. Pod stuck Pending

Diagnosis: kubectl describe pod. Look for "0/3 nodes available: insufficient cpu" or "didn't match node selector."

Likely causes: Insufficient capacity, resource requests too high, taints/tolerations mismatch, PVC not bound.

Fix: Check kubectl describe nodes for available resources. If maxed, autoscale.

4. OOMKilled

Diagnosis: kubectl describe pod shows "Last State: Terminated, Reason: OOMKilled."

Likely causes: Container exceeded memory limit, JVM not configured for container limits, memory leak.

Fix: Increase limits if workload genuinely needs more. For Java apps, use -XX:MaxRAMPercentage properly.

5. Service unreachable

Diagnosis:

kubectl get endpoints SVC_NAME

Likely causes: No endpoints (selector doesn't match pod labels), pod not listening on expected port, NetworkPolicy blocking traffic.

Fix: 99% of the time it's a label selector mismatch.

6. DNS resolution failing

Diagnosis: kubectl exec into pod, run nslookup. Check CoreDNS pods.

Likely causes: CoreDNS pods crashed, NetworkPolicy blocking DNS, /etc/resolv.conf misconfigured.

Fix: Restart CoreDNS if misbehaving. On EKS, defaults are sometimes too low for busy clusters.

7. Ingress 502 Bad Gateway

Likely causes: Backend pod down, target group health check failing, port mismatch, slow startup so ALB marks unhealthy.

Fix: Check target group health in AWS console. Fix readiness probe if pods unhealthy.

8. PVC stuck Pending

Likely causes: No StorageClass set, EBS CSI driver not installed, IAM permissions for the driver.

Fix on EKS: Install EBS CSI driver as an EKS add-on. Service account needs the right IAM role via IRSA.

9. Node Not Ready

Likely causes: Kubelet crashed, container runtime issue, disk pressure, network plugin failure.

Fix: SSH to node (or SSM Session Manager). Check journalctl -u kubelet. Often it's disk full from log accumulation.

10. HPA not scaling

Likely causes: Metrics-server not installed, HPA targeting CPU but pod has no CPU requests, max replicas reached.

Fix: kubectl get hpa. If <unknown> appears under metrics, metrics-server is broken.

How to use this playbook

When you hit a real incident, search for keywords from the symptom. Most day-to-day stuff is covered.

If you want to actually practice these in a safe environment, our K8s troubleshooting labs at IT Defined are exactly this — broken clusters with planted issues, fix them under time pressure.

Full 26 scenarios — including ConfigMap updates, Secret rotation, NetworkPolicy issues, PDB blocks, autoscaler problems, kube-proxy/CNI issues, Job failures, IRSA problems, webhook admission controllers, liveness probes, PV cleanup, and cluster upgrades — on itdefined.org.