DEV Community: suresh devops

Can we survive as DevOps Engineers in AI Era ?

suresh devops — Sat, 21 Feb 2026 12:11:06 +0000

_(Haunted by AI and trying to find answers - Notes -1)
_

As DevOps professionals, the most dangerous thing we can do right now is mistake our ability to write a CI/CD pipeline for job security. We are witnessing a violent divergence in the DevOps career path.

AI is not coming for the architects; it is coming for the executors—the engineers who have spent their careers living at "YAML-level depth."

The central thesis of this new era is : The manual pipeline builder is a dying breed. The AI-Orchestrating Architect is the future.

*The Great Split: Tool Operators vs. Platform Architects
*
The industry is bifurcating into two distinct paths with vastly different lifespans. Understanding which path you are on is the difference between professional obsolescence and becoming a linchpin of the enterprise.

The Divergent Paths

• Path 1: The Tool Operator (High Risk) This role is defined by routine execution: writing pipelines, drafting Helm charts, deploying containers, and patching minor configuration drift. Because AI can already automate 60–70% of these commodity tasks, this layer of the profession is rapidly becoming a low-value commodity.

• Path 2: The Platform Architect / Reliability Strategist (High Value) This is the domain of the AI-Orchestrating Architect. It involves high-level systemic design: failure domains, multi-region architecture, SLO/SLA strategy, and governing how AI is integrated into the SDLC.
Analysis: The automation of commodity tasks makes the move to Path 2 an urgent strategic necessity. To survive this shift, you must stop being the person who moves the bricks and start being the person who designs the structural integrity of the building.

"AI will replace DevOps engineers who stay at YAML-level depth."

1: Distributed Systems Depth Over CLI Commands

Mastering Kubernetes syntax or specific CLI flags is no longer a competitive advantage—it’s a baseline. AI can generate infrastructure code and deploy clusters with high proficiency. However, AI lacks the systemic intuition required to prevent or debug a total system collapse. To reach the architect level, you must master the mechanics of distributed systems:

• Consensus & Leader Election: Mastering the Raft protocol and how state is maintained.
• The CAP Theorem: Navigating the brutal trade-offs between Consistency, Availability, and Partition Tolerance.
• Failure Management: Managing Eventual Consistency, Backpressure, Circuit Breaking, and the logic required to survive Retry Storms.
• Idempotency: Ensuring that system operations can be repeated without unintended side effects.

Analysis: While AI can build the infrastructure, human architectural thinking is required to manage the complexity of distributed failure. AI might build the bridge, but the Architect understands why the resonance of a thousand footsteps will make it collapse.

2: The SRE Mindset and the "Error Budget"

The definition of "success" is shifting from binary uptime to Site Reliability Engineering (SRE). A Strategic Lead doesn't just ask if a service is "up"; they manage the reliability of the system as a product feature.

Key SRE pillars from the source include:

• SLO Design and Golden Signals: Measuring the metrics that actually impact the business.
• Error Budgets: Using data to negotiate the tension between feature velocity and system stability.
• Incident Analysis and Chaos Engineering: Proactively testing resilience and performing deep-dive postmortems to find the "why" behind the "what."

Analysis: Designing for failure before it happens is a uniquely human skill. It requires a proactive, strategic mindset that views every incident not as a chore, but as an architectural data point.

3: FinOps – Cost is Now an Architectural Pillar

In the age of massive cloud scale, infrastructure cost is no longer an accounting problem—it is a primary architectural constraint. As an architect, you must treat "Cost Architecture" with the same rigor as security.

This requires mastering:

• Bin Packing Math: Optimizing resource density to ensure maximum ROI on compute.
• Spot vs. On-Demand Strategy: Designing workloads that can survive the volatility of spot instances to slash OpEx.
• Observability Cost Control: Ensuring the cost of monitoring the system doesn't exceed the value of the data gathered.
• Workload Rightsizing: Constant, automated tuning of resource requests to match actual utilization.

Analysis: AI can generate a thousand instances in seconds, but it lacks the organizational context to know if those instances are a waste of capital. FinOps isn't just about saving money; it’s about reclaiming the innovation budget.

4: The Rise of the Internal Developer Platform (IDP)

The future of DevOps is Platform Engineering. This marks a fundamental shift from "deploying applications" to "designing the systems that deploy applications."

The goal of the Architect is to move from being a manual bottleneck to becoming a platform provider. This is achieved through:

• Golden Path Templates: "Secure-by-default" workflows that make the right way the easy way.
• Guardrails and Policy as Code: Automated governance that prevents disasters before they are committed to main.
• Self-Service Environments: Empowering developers to move at speed within safe, pre-defined boundaries.

Analysis: When you stop being the person who manually runs the deployment and start being the architect of the IDP, you become a force multiplier for the entire engineering organization.

5: Judgment Under Uncertainty – The Irreplaceable Skill

What makes a professional truly irreplaceable is not a collection of certifications, but judgment under uncertainty. You must become the AI-Orchestrating Architect who uses AI for root cause exploration, log pattern detection, threat modeling, and IaC reviews, but retains the final word.

Critical "When to..." decisions include:

• When to re-architect a legacy system vs. when to maintain the "technical debt."
• When to deprecate a service that no longer serves the business.
• When to accept calculated risk for the sake of market speed.
"AI suggests. Architect decides."

Analysis: This level of judgment is your career "moat." It is the protective layer that automation cannot cross because it requires context, experience, and the accountability to own the outcome.

A Strategic Roadmap:

Transitioning from an executor to a Reliability Strategist requires a structured evolution:

Foundations & Distributed Systems Deep dive into Raft/Consensus, master observability, design High Availability (HA) across zones/regions, and take the lead on high-stakes incident analysis.

Platform Ownership & Governance Own the platform architecture, design organization-wide cost governance models, and build the reusable DevOps frameworks that others consume.

Strategy & Enterprise Architecture Define the enterprise-level DevOps strategy, lead multi-cluster/multi-region global designs, and move into the role of Principal Engineer or Enterprise Architect.

The Hard Truth:

The era of the "pipeline builder" is ending. The demand for those who can navigate platform engineering, reliability, cost optimization, and AI governance is exploding.

Ignoring AI is the only certain way to be replaced. Those who integrate AI to handle the "grunt work" of log analysis and threat modeling will always outperform those who try to compete with the machine.

DevOps as “platform + reliability + cost + AI governance” will expand massively

How we fixed Real Kubernetes Production Incidents

suresh devops — Wed, 18 Feb 2026 15:30:28 +0000

We Used 'Kubernetes Advanced Scheduling' to Fix Real Production Incidents as below:

Kubernetes scheduling looks simple — until production traffic hits.
In lower environments, the default scheduler works beautifully.

In production, under load, across multiple AZs, with service mesh, sidecars, GPUs, and mixed workloads?

That’s when we discovered:
The scheduler is not “basic.”

It’s powerful — if we know how to use it.

Based on real issues we resolved in a live Kubernetes environment running critical workloads, these were 2 AM incidents.

Incident 1: All Pods in One AZ – The Near-Outage

The Situation

We had a 3-zone cluster.
A critical API deployment had 9 replicas.
One evening, during a traffic spike, one Availability Zone had network degradation.
Suddenly, 6 of our 9 pods were unreachable.
Why?
Because the default scheduler had “helpfully” packed pods into fewer zones for efficiency.
ReplicaSet guarantees count.
It does NOT guarantee distribution.

That night we learned:
High Availability is not automatic.

The Fix: Pod Topology Spread Constraints

We implemented:
• topologyKey: topology.kubernetes.io/zone
• maxSkew: 1
• whenUnsatisfiable: DoNotSchedule

This forced the scheduler to maintain balanced pod distribution across zones.

What Changed Immediately

• Pods spread 3-3-3 across AZs
• Rolling updates maintained balance
• Zone failure no longer meant partial collapse

**Why This Matters

Unlike anti-affinity (which is binary), Topology Spread Constraints are quantitative.

They enforce distribution math.
That’s the difference.

Real Production Lesson

During a rolling update before the fix:

New pods filled one AZ first.
Traffic imbalance followed.
Latency spikes happened.

After implementing spread constraints:

Updates remained balanced from the first pod.
No traffic concentration.

Incident 2: Node Memory Pressure – “But Requests Look Fine”
The Situation

We were using:
• Service mesh (Istio sidecars)
• Kata Containers for isolated workloads

Pods were requesting 512MB.
Nodes were showing allocatable capacity sufficient for 20 pods.

But after deploying 15 pods:

• Memory pressure started
• Evictions happened
• Latency increased

Metrics didn’t make sense.
Until we realized:
The scheduler was blind to runtime overhead.

The Fix: Pod Overhead via RuntimeClass

We defined a RuntimeClass with:

overhead:
podFixed:
memory: "50Mi"
cpu: "50m"
Now every pod using that RuntimeClass automatically added overhead to scheduling math.

What Changed?

Before:
Scheduler thought 512MB per pod.

After:
Scheduler calculated ~562MB per pod.
Node fitting became accurate.
Evictions stopped.

Critical Insight

Pod Overhead creates an “invisible container” in scheduling math.

Without it:
You overcommit silently.

With it:
Scheduling becomes financially and operationally accurate.

Incident 3: GPU Pods Landing on Spot Nodes

The Situation

We had:

• GPU nodes (on-demand)
• General compute nodes (Spot instances)
A training workload got scheduled on a Spot node without GPU.

Result:

• CrashLoopBackOff
• Wasted compute
• Delayed ML training pipeline

Node labels existed.
But scheduling preference wasn’t strict enough.

The Fix: Scheduler Profiles

Instead of rewriting the scheduler, we:

• Created a separate scheduler profile
• Configured specific scoring behavior for ML workloads
• Used taints + tolerations + stricter filtering

Now:

• General workloads used default scheduling
• ML workloads used a GPU-aware scheduling profile

Production Impact

• Zero misplacements
• Predictable GPU utilization
• Better cost control

Incident 4: Bin Packing Wasn’t Actually Packing

The Situation

We were optimizing for cost.
Using bin packing strategy (MostRequested scoring).
But nodes were underutilized by 5–7%.

Finance asked:

“Why are we scaling nodes when CPU is still free?”
The missing variable?
Pod Overhead wasn’t being considered in our mental math.

The Real Scheduler Math

When NodeResourceFit runs:

Total Pod Compute =
(Sum of container requests) + (Pod Overhead)

During scoring:
Scheduler evaluates:
(Current Node Usage + Total Pod Compute)

With overhead included,
Bin packing becomes more accurate.
Without understanding this,
You miscalculate cluster density.

After adjusting runtime overhead correctly:
• Node utilization improved
• Fewer nodes required
• Monthly cloud cost dropped noticeably

What Most Teams Don’t Realize

The scheduler is not a black box.

It has phases:

Filtering
Scoring
Binding

You can influence each phase.
Advanced scheduling is not “nice to have.”

It is production engineering.

Lessons learnt

Replica count ≠ High Availability : Use Topology Spread Constraints.
Container request ≠ Total consumption : Use Pod Overhead when sidecars or microVMs are involved.
One scheduler profile ≠ All workloads : Use Scheduler Profiles for workload classes.
Bin packing requires correct math Understand NodeResourceFit + Overhead interaction.

Running Kubernetes vs Engineering Kubernetes

Anyone can deploy pods.
Engineering Kubernetes means:
• Designing failure domains intentionally
• Accounting for invisible compute
• Aligning scheduling with business rules
• Optimizing cost through math, not guesswork

The scheduler is not just a placement tool.

It is:
A control plane decision engine.
And this separates:
Clusters that survive production
from
Clusters that collapse under it.

(You no need to be a Kubestronaut to learn and apply not so known features of Kubernetes. All you need is a 'crisis' at 2 AM)

DevOps Books - A DevOps enthusiast's bucket list

suresh devops — Wed, 18 Feb 2026 09:57:28 +0000

Today’s default learning path for most techies looks like this:

Problem → Panic → Open GPT → Copy → Deploy → Pray.

We’ve optimized for speed of answers, not depth of understanding.

Here’s the uncomfortable truth:

GPT can give you answers.
Books rewire how you think.

And DevOps at the Architect level?
It’s not about YAML. It’s about perception.

If you want to think like a DevOps Architect — beyond what you do — you need something deeper than prompts.

You need frameworks.
You need mental models.
You need immersion.

The DevOps Architect Bucket List

If you truly want to think beyond tools and certifications, add these to your professional bucket list:

☐ The Phoenix Project – Understand organizational chaos
☐ Accelerate – Learn what high performance actually looks like
☐ Continuous Delivery – Master deployment flow
☐ The DevOps Handbook – Structure transformation
☐ The Docker Book – Build container foundations
☐ Lean IT: Enabling and Sustaining Your Lean Transformation – Eliminate systemic waste
☐ The Journey to DevOps – A Testing Perspective – Strengthen quality culture

This is about intentionally studying 7 foundational works that shape how DevOps Architects think — beyond how tool operates, configure pipelines.

Book: The Phoenix Project

Author: Gene Kim

Starts with story.

This book humanizes DevOps. It exposes silos, burnout, firefighting culture, and broken flow.

Before you design scalable systems, you must understand why organizations collapse under their own complexity.

Architects first learn to see the system.

Book: Accelerate

Authors:
Nicole Forsgren, Jez Humble, Gene Kim

(Based on research from the State of DevOps Reports)

This book moves DevOps from belief to evidence.

Deployment frequency.
Lead time.
Change failure rate.
MTTR.

Now you’re not “suggesting improvements.”
You’re defending architecture with science.
Architects use data as leverage.

Book: Continuous Delivery

Authors:
Jez Humble, David Farley

This is your technical deep dive.

Understanding how code moves safely from commit to production is more important than learning any specific CI/CD tool.

Tools evolve.
Delivery principles do not.

Spend time mapping these concepts into your real infrastructure.

Architects understand flow before scale.

Book: The DevOps Handbook

Authors:
Gene Kim, Patrick Debois, John Willis, Jez Humble

If DevOps had a constitution, this would be it.

Culture.
Automation.
Measurement.
Sharing.

This book connects technical practices with leadership and organizational design.

It transforms DevOps from “engineering initiative” into “business strategy.”

Architects think beyond pipelines.

Book: The Docker Book

Author: James Turnbull

Before Kubernetes.
Before large-scale orchestration.

You must understand containers deeply.

Isolation.
Portability.
Consistency.

Modern cloud architecture stands on this foundation.

Architects master primitives before abstractions.

Book: Lean IT: Enabling and Sustaining Your Lean Transformation

Authors:
Mike Orzen, Steven Bell

DevOps without Lean becomes automation theater.

Lean IT teaches systems thinking.

Every delay.
Every handoff.
Every unnecessary approval.

That’s waste.

Architects remove friction before automating it.

Book: The Journey to DevOps – A Testing Perspective

Author: Chris Riley

This phase sharpens your quality mindset.

DevOps is incomplete without testing strategy.

Automation without validation is chaos at scale.

This book reinforces that DevOps maturity includes strong testing culture and continuous validation.

Architects design reliability into the system — not after it breaks.

The Real Difference comes from Long-form reading:

It Builds pattern recognition, Strengthens systems thinking, Improves strategic communication , Develops architectural intuition

Most engineers consume snippets.
Architects build frameworks.

These Seven books.
(May be in Twelve months?)

That’s the blueprint.

By committing to this roadmap, you move from:

Tool operator
to
Systems thinker
to
Strategic DevOps Architect

Most engineers collect certifications

Architects collect mental models.

Choose wisely.

(I completed some. The rest are scheduled — long before the final production shutdown)

My Accidental 20-Year DevOps Journey : from Cruise Control to Cloud Native

suresh devops — Mon, 16 Feb 2026 04:49:48 +0000

**Or: How I Learned to Stop Worrying and start living with Pipelines

2007.

I was there, typing away in a dimly lit office, surrounded by the gentle hum of servers that were actually in the same building (kids, ask your parents about "server rooms").

My tools of the trade? CruiseControl (bless its XML-filled heart). Ant scripts that looked like ancient scrolls. ClearCase—the version control system that required a PhD in theoretical physics and the patience of a saint. And good old SVC, because Git was still just a twinkle in Linus Torvalds' eye.

Back then, we called it "Build Release and Management." BRM. It sounded like a mild skin condition or perhaps a small-town accounting firm. Nobody outside our immediate team knew what we did. When people asked my job title, I'd say "I handle builds" and watch their eyes glaze over faster than a Windows 98 boot screen.

I had absolutely zero idea that I was standing at the foot of a mountain that would consume the next two decades of my professional life—and fundamentally reshape how the entire world builds software.

The Accidental Pioneer:

Here's the thing about being "early" to something: you don't know it's early. You just know you're tired of manually deploying code at 2 AM while the developer who broke the build sleeps peacefully, dreaming of clean compilations.
We weren't trying to start a revolution. We were just trying to get CruiseControl to send emails that weren't caught by the corporate spam filter.
But somewhere between wrestling with ClearCase branching strategies and writing Ant targets that would make a masochist wince, something was happening. The seeds were being planted. The industry was slowly realizing: "Wait, maybe the people who write code and the people who run it should probably... talk to each other?"
Revolutionary stuff, I know.

The Unstoppable Tidal Wave

Then DevOps happened. Or rather, DevOps happened to us.
Suddenly, everyone cared about what we did. Developers wanted to know about deployment pipelines. Ops wanted to understand the build process. Managers started using words like "synergy" and "continuous delivery" in meetings, usually incorrectly, but the intent was there!

The tools exploded:
• Jenkins rose from the ashes of Hudson
• Git arrived like a glorious messiah of branching
• Puppet, Chef, Ansible—we could finally script servers like code
• Docker containers (containers! that weren't Solaris Zones!)
• Kubernetes—because one container is lonely, but thousands need a therapist
• Terraform—infrastructure as code, because why should apps have all the fun?
Every few years, just when I thought, "Okay, surely we've automated everything now," the universe would respond: "Hold my beer."

From BRM Newbie to DevSecOps Eng. Lead

Fast forward to 2026. Two decades later:

I'm now DevSecOps strategist, FinOps Focal, ML Ops Starter, AI OPS enthusiast, Forced Innovationist across multiple Platforms.

The naive BRM learner who once celebrated getting CruiseControl to trigger a build without crashing now spends days thinking about:

• Shifting security left—so far left that security is practically in the parking lot before development even starts

• Platform engineering—building internal developer platforms that feel like magic, but with better documentation

• Compliance as code—because auditors shouldn't need spreadsheets in 2026

• MLOps—applying DevOps principles to models that have opinions about cats vs. dogs

• FinOps—because the cloud isn't actually "infinite," it's just "very large and surprisingly expensive"

• Cognitive load reduction—fancy term for "making sure my engineers don't quit to become goat farmers"

The tools have changed. The complexity has multiplied exponentially. But the mission remains surprisingly similar: get good software to users reliably, securely, and without waking anyone up at 2 AM.

Because We're All Still Here…

After twenty years, here's what I've learned:

Lesson 1. Nothing ages faster than "cutting edge."

I once gave a talk about why Maven was the future. I stand by what I said, but I also stand by my cargo shorts from 2007. Some things should stay in the past.

Lesson 2. The more things change, the more they break.

We have GitOps, policy-as-code, canary deployments, and chaos engineering. And yet, somewhere today, a developer is pushing directly to main and wondering why prod is on fire. That developer was me in 2008. It might still be me occasionally in 2026.

Lesson 3. DevOps is still growing because software is still eating the world.
Every industry, every company, every idea eventually becomes software-driven. And software needs to be built, deployed, and run. We're not running out of work anytime soon.

Lesson 4. The best tool is still the one your team will actually use.
Kubernetes is amazing. So is Bash. So is that weird shell script Frank wrote in 2019 that everyone pretends to understand. The technology matters less than the culture.

Lesson 5. We're all still learning.

I've been doing this for twenty years. I still Google basic YAML syntax. I still panic when a pipeline fails. I still celebrate small victories like a deployment that works on the first try.

I have had infinite lessons with finite scars in my DevOps journey.

To start with my 'epiphany' during my DevOps dark days, I would refer below books for 'Serious only' DevOps enthusiasts.

Patrick Debois - The DevOps Handbook
Gene Kim - The Phoenix Project, The DevOps Handbook
Jez Humble - Continuous Delivery, The DevOps Handbook
John Willis - The DevOps Handbook, Docker tutorial series
James Turnbull- The Docker Book, The Art Of Monitoring
Nicole Forsgren- State of DevOps Report, The Data on DevOps
Alan Shimel- Founder DevOps.com & DevOpsInstitute.com
Mike Orzen - Lean IT: Enabling and Sustaining Your Lean Transformation, Field Guide
Chris Riley - DevOps.com, The Journey to DevOps – A Testing Perspective
Sean Hull - Authors AWS, databases, scalability, and the cloud

What's Next?

If someone told me in 2007 that I'd still be doing this in 2026, writing blogs about DevSecOps and platform engineering, I'd have laughed. Then asked if ClearCase finally added Git integration. (It didn't. It really didn't.)

But here we are. DevOps isn't a job title anymore—it's how software gets made. It's not a team—it's a culture. It's not a trend—it's the baseline.

And the best part? It's still growing. Still evolving. Still full of problems to solve and puzzles to untangle.

To the BRM newbies, the DevOps engineers, the platform builders, the SREs, the release managers, and everyone who's ever spent four hours debugging a YAML indentation issue:

Welcome. The journey is weird, wonderful, and absolutely nowhere near finished.
Come for the automation. Stay for the people. And never, ever trust a deployment on a Friday afternoon.

A Former BRM Enthusiast, Current YAML Survivor

P.S. - If anyone needs me, I'll be writing Ant scripts for old times' sake. Just kidding. I'm not a masochist.

Kubernetes Advanced Scheduling ( Hidden gems of Kubernetes )

suresh devops — Sun, 15 Feb 2026 12:40:49 +0000

Kubernetes is often described as a scheduler’s operating system. The default scheduler (kube-scheduler) does an excellent job of placing pods on nodes based on basic resource requests and limits. However, in complex, production-grade environments, the default "fit and spread" logic is often not enough.

While Custom Resource Definitions (CRDs) and Operators get most of the spotlight in the ecosystem, the control plane itself hides a treasure trove of powerful scheduling levers. Here is a deep dive into the advanced scheduling features that separate a functional cluster from a finely-tuned, production-ready one.

Problem 1: Pod Topology Spread Constraints: Orchestrating Disaster Recovery

The Problem:
By default, a ReplicaSet ensures a certain number of replicas are running, but it doesn't care where they run. In a cloud environment, if all pods land on the same Availability Zone (AZ) and that zone fails, you experience a full outage. Similarly, if pods are packed onto the same node, a node failure takes out the entire service.

The Solution:
Pod Topology Spread Constraints allow you to enforce strict or best-effort distribution rules across arbitrary failure domains (e.g., zones, nodes, or even custom host labels).

How it works:
You define a topologySpreadConstraints field in your Pod spec. You specify:
• topologyKey : The label key on nodes that defines the domain (e.g., topology.kubernetes.io/zone).
• maxSkew : The maximum allowable difference in the number of pods across domains.
• whenUnsatisfiable : What to do if the skew can't be met (DoNotSchedule vs ScheduleAnyway).

Technical Deep Dive:
Unlike PodAffinity/AntiAffinity, which are binary (must be/not be), Spread Constraints are quantitative. They look at the distribution delta.
• Use Case 1: Strict HA: You can ensure that if you have 3 zones, a deployment of 6 pods is scheduled exactly 2 per zone. If a zone is unhealthy and a pod reschedules, the scheduler will wait until the zone recovers or another pod moves to maintain balance.
• Use Case 2: Rolling Updates: During a rolling update, new pods are created. Without spread constraints, the scheduler might fill up one zone first to bin-pack. With spread constraints, the new pods are spread evenly from the start, maintaining balance during the transition.

Problem 2. Pod Overhead: Accounting for the Invisible Resources

The Problem:
When using "microVMs" (Kata Containers, gVisor) or sidecar-heavy meshes (Istio), the user container is not the only thing consuming resources. The runtime shim or the sidecar proxy consumes CPU and memory before the main process starts. If the scheduler ignores this, it will overcommit the node, leading to throttling or node pressure.

The Solution:
Pod Overhead is a feature used primarily with RuntimeClasses. It allows you to define the resources consumed by the infrastructure (the runtime) per pod.

How it works:
You define a RuntimeClass that includes an overhead field. Any pod using that RuntimeClass automatically adds this overhead to its scheduling calculations.

Technical Deep Dive:
Scheduling is a math problem. The scheduler sums pod.spec.containers[*].resources.requests to determine node fit.
Pod Overhead injects an additional "invisible container" into this calculation.

• Example: A Kata container might need 50MB of memory and 5% CPU for the VM kernel and agent. The user requests 512MB for their app. The scheduler will see a total demand of 562MB. Without this, the node would be overprovisioned, and the VM might crash.
• Monitoring Impact: This also affects kubectl top node. The "allocatable" resources now account for this overhead, giving a more accurate picture of node utilization.

Problem 3. Scheduler Profiles & Extenders: Custom Logic Without Custom Code

The Problem:
You need a specific scheduling rule (e.g., "Don't schedule GPU pods on nodes with Spot instances," or "Prefer nodes with SSDs for databases"). Rewriting the entire Kubernetes scheduler from scratch is a daunting and fragile task.

The Solution:
Scheduler Profiles (introduced in v1.18) and Scheduler Extenders (legacy but powerful) allow you to inject custom logic into the scheduling pipeline.
• Scheduler Profiles (Multi-point): Allow you to run multiple scheduling configurations in parallel. You can configure which set of plugins (default or custom) run for specific pods.
• Scheduler Extenders: A process external to the scheduler that acts as a "webhook" for scheduling. The scheduler sends it a list of filtered nodes, and the extender filters or prioritizes them further.

Technical Deep Dive:
The scheduling cycle is split into phases: Filtering (Predicates), Scoring (Priorities), and Binding.
• Multi-scheduling: With Profiles, you could have one profile for general workloads that bin-packs tightly, and another profile for critical workloads that spreads thinly across nodes.
• Extenders: Imagine you have a hardware accelerator connected via USB to some nodes. The scheduler doesn't know about USB devices. An extender can look at the pod annotation, check an inventory database, and filter out nodes that don't have the USB device plugged in, allowing the pod to land only on physically capable hardware.

Problem 4. NodeResourceFit with Pod Overhead: The Hidden Math

The Problem:
While we covered Pod Overhead, it exists in isolation. The real magic happens when the scheduler's NodeResourceFit plugin interacts with it. Many engineers assume that if they set Pod Overhead, the scheduler just "adds it." But understanding how it adds it reveals potential pitfalls.

The Solution:
The NodeResourceFit plugin is the component that checks if a node has enough resources. Its integration with Pod Overhead ensures that overhead is treated as a first-class citizen during the Filtering and Scoring phases.

Technical Deep Dive:
There is a nuance here regarding Scoring. The scoring algorithm usually calculates a score based on the ratio of pod requests to node allocatable resources. With Pod Overhead:

Total Pod Compute: (Sum of Container Requests) + (Pod Overhead)
Node Consumption Calculation: When scoring, the scheduler looks at the current node usage + Total Pod Compute.
MostRequested vs LeastRequested: If you are using MostRequested (bin packing) strategy, the inclusion of overhead means the scheduler will actually pack nodes tighter because it accounts for the "wasted" overhead resource, ensuring the user payload is dense relative to the total claimed resources. If you forget this, your bin-packing algorithm will be inaccurate by the margin of your overhead, potentially leaving CPU on the table.

Conclusion :

These scheduling features represent the difference between "running Kubernetes" and "engineering Kubernetes." By leveraging Topology Spread Constraints, you ensure business continuity. By using Pod Overhead, you maintain financial and operational accuracy in mixed runtime environments. And by utilizing Scheduler Profiles, you unlock the ability to tailor the control plane to your specific hardware and business logic without fighting the upstream project.

The scheduler is not a black box; it is a configurable engine. These "hidden gems" are the keys to unlocking its full potential.

Hidden gems of Kubernetes

suresh devops — Mon, 09 Feb 2026 12:40:49 +0000

While CRDs and API extensions are well-known, Kubernetes has many powerful but underutilized features. Here are some that even experienced DevOps engineers often overlook:

Advanced Scheduling Features
- Pod Topology Spread Constraints : Fine-grained control over pod distribution across zones, nodes, etc.
- Pod Overhead : Account for runtime/daemon overhead when scheduling (critical for Kata Containers, gVisor).
- Scheduler Profiles & Extenders : Custom scheduler behavior without writing a custom scheduler.
- NodeResourceFit with Pod Overhead : Actually considers runtime overhead in scheduling decisions.
Networking Deep Cuts
- Network Policy Port Ranges : (K8s 1.25+) Specify port ranges like 30000-32767 in NetworkPolicies.
- Service internalTrafficPolicy: Local : Prefer routing traffic to pods on same node for NodePort/LoadBalancer.
- IPVS Session Affinity Fine-Tuning : Timeout settings, scheduling algorithms beyond round-robin.
- EndpointSlice : More scalable alternative to Endpoints (automatic since 1.21, but few leverage its full API).
Storage Gems
- Volume Populators : (Alpha→Beta) Create PVC content from custom sources (like snapshots, http) pre-attachment.
- Read-Write-Many (RWX) for block storage : Some CSI drivers now support it via filesystem layer magic.
- Volume Health Monitoring : CSI driver can report volume issues to Kubernetes events.
- Generic Ephemeral Volumes : Request temporary storage without creating StorageClass/PVC definitions.
Security Obscurities
- Pod Security Admission (PSA) Exemptions : Namespace-level exemptions for specific service accounts.
- Seccomp/AppArmor Annotations for Windows : Wait, they exist (some work on Windows Server 2022+).
- TokenRequest API : Short-lived service account tokens with audience binding.
- CSIVolumeFSGroupPolicy : Control how CSI drivers handle fsGroup ownership changes.
API Machinery & Admission Magic
- API Priority and Fairness (APF) : Prevent noisy neighbors in API server with flow control.
- ValidatingAdmissionPolicy : (K8s 1.26+) CEL-based policies without webhook complexity.
- Server-Side Apply Field Management : Track which manager owns which field for conflict resolution.
- API Aggregation Layers : Not just for CRDs - aggregate multiple API servers transparently.
Node & Runtime Features
- Kubelet Credential Providers : Plugins for dynamic registry credential fetching (ECR, GCR, ACR).
- RuntimeClass Scheduling : Schedule pods to specific container runtimes (runc, Kata, gVisor).
- Node System Swap Support : (K8s 1.22+) Yes, swap can now be enabled with performance caveats.
- Pod Memory QoS : Memory throttling for containers using cgroups v2.
Debugging & Observability
- Kubelet Tracing : Built-in OpenTelemetry traces for kubelet operations.
- Dynamic Kubelet Configuration : (Deprecated but interesting) Concept lives on in KEPs.
- Pod Ready++ : Startup probes are known, but ReadinessGate for custom conditions isn't.
- Pod Disruption Budget with Unhealthy Pod Exclusion : Auto-exclude unhealthy pods from PDB calculations.
Workload Features
- Pod Lifecycle Sleep Action : In postStart/preStop hooks - sleep 60 is more common than you think.
- Pod Deletion Cost : (K8s 1.21+) Annotation to control pod deletion order during downscaling.
- Pod Topology Spread Constraints by Pod Label : Spread based on pod labels, not just topology.
- Suspend CronJobs : Temporarily disable without deleting.
CLI & Client Hidden Gems
- kubectl alpha debug : Node debugging with ephemeral containers (now stable as kubectl debug).
- kubectl events : The sorted, combined event view everyone should use but doesn't.
- Client-Side Apply Server-Side Diff : kubectl apply --server-side --dry-run=server.
- Custom Columns with JSONPath : kubectl get pods -o custom-columns=... with complex expressions.
Ecosystem Integration Points
- Container Device Interface (CDI) : Standard for exposing hardware (GPUs, FPGAs, etc.) to containers.
- Service Binding Specification : (K8s 1.21+) Standardized way to bind services to workloads.
- Cluster Trust Bundles : Distribute CA certificates to workloads trustlessly.

Why These Are Unknown:

Version Skew : Many use older K8s versions
Cloud Provider Abstractions : Managed services hide complexity
Documentation Depth : Features exist but aren't emphasized
Complexity vs. Benefit : Some are too niche for general use
Gradual Rollouts : Features exist for years in alpha/beta before attention