Sreekanth Kuruba

Posted on Feb 24 • Edited on Apr 3

From Process Management to State Reconciliation

#kubernetes #devops #sre #linux

I used to restart servers at 2AM… Kubernetes made that job disappear

02:15 AM — Pager goes off
“nginx is down on web-01”

You wake up.
Grab your laptop.
SSH into the server.
Run a few commands. Restart the process.

02:22 AM — It’s back.

Try to sleep again.

This used to be normal.

Then Kubernetes changed the rules.

🧱 The old world: Process-driven operations

Before Kubernetes, everything revolved around processes.

A service was:

A Linux process
Running on a specific machine
Identified by a PID
Restarted manually (or via basic supervisors)

The assumptions were simple:

Machines are stable
Failures are rare
Humans fix problems

And when something broke…
👉 you fixed it

Availability depended on:

How fast someone could wake up and respond.

🐳 Containers helped… but didn’t solve the real problem

With tools like Docker, things improved:

Consistent environments
Faster deployments
Fewer “works on my machine” issues

But let’s be honest…

If a container crashed:

Maybe it restarted
Maybe it didn’t

If the node died?

You’re still in trouble

If dependencies failed?

Still your problem

👉 Containers improved portability
👉 They did NOT guarantee reliability

🔄 Kubernetes changed the question

Kubernetes doesn’t ask:

“Is this process running?”

It asks:

“Is the system in the state I declared?”

That’s a massive shift.

Instead of managing processes…
you define desired state.

⚙️ The magic: State reconciliation

You declare:

“I want 3 replicas”
“They should always be running”
“They should be healthy”

Kubernetes continuously checks:

Current state
Desired state

If something breaks…
👉 it fixes it automatically

Not later.
Not after a pager alert.
Continuously.

🔄 Traditional vs Kubernetes minds

🧠 Why Kubernetes doesn’t care about PIDs

In traditional systems:

PID = identity

In Kubernetes:

PID = irrelevant

Because a PID is:

Local to a machine
Temporary
Lost on restart

Kubernetes doesn’t track processes.

It tracks:

Desired outcomes

You don’t ask:

“What’s the PID?”

You ask:

“Do I have 3 healthy pods?”

👉 That’s the difference between instance thinking and system thinking

💥 The real shift: Replace, don’t repair

Old mindset:

Fix the broken process

New mindset:

Replace it

👉 Failure is handled through replacement, not repair.

Kubernetes doesn’t try to “save” things.

It simply ensures:

The system matches your declared state

🧪 Jobs are different too

Before:

Run jobs manually
Monitor externally
Retry manually

Now:

Define a Job
Kubernetes ensures completion
Retries automatically
Tracks success/failure

👉 You define intent.
👉 System enforces outcome.

⚠️ Failure is not an exception anymore

At scale, failure is constant.

Systems like Google’s Borg (Kubernetes’ ancestor) proved this:

Machines fail
Networks break
Processes crash

Not if
But how often

Kubernetes is built for this reality.

It assumes:

Nodes will disappear
Pods will die
Networks will glitch

And it’s okay with that.

🔁 What actually changed?

Before Kubernetes:

You maintained systems
You fixed failures
You reacted

After Kubernetes:

You define intent
The system maintains itself
Recovery is automatic

👉 Your job shifts from:
operator → system designer

🏁 Final thought

Kubernetes doesn’t remove failure.

It removes panic.

The system doesn’t ask:

“Who will fix this?”

It asks:

“What should this look like?”

And then it makes it happen.

💬 Your turn

What’s the last thing you had to fix manually at 2AM?

And could Kubernetes have handled it for you?

DEV Community