Diya

Posted on May 25

Deployment using all three Kubernetes probes

#monitoring #tutorial #devops #kubernetes

Full Example YAML

Here’s a deployment using all three Kubernetes probes:

containers:
  - name: api
    image: my-api:latest

    startupProbe:
      httpGet:
        path: /readyz
        port: 5000
      failureThreshold: 20
      periodSeconds: 15

    readinessProbe:
      httpGet:
        path: /readyz
        port: 5000
      initialDelaySeconds: 5
      periodSeconds: 10
      failureThreshold: 3

    livenessProbe:
      httpGet:
        path: /healthz
        port: 5000
      initialDelaySeconds: 30
      periodSeconds: 20
      failureThreshold: 3

Now let’s break down what Kubernetes is actually doing here.

startupProbe

startupProbe:
  httpGet:
    path: /readyz
    port: 5000
  failureThreshold: 20
  periodSeconds: 15

This tells Kubernetes:

Check /readyz every 15 seconds.
Allow 20 failures before killing the container.

Calculation:

15 seconds × 20 failures = 300 seconds

So Kubernetes gives the application:

5 minutes to fully start

before deciding:

“The application failed to start.”

Default Values

If not specified, Kubernetes uses:

periodSeconds: 10
timeoutSeconds: 1
failureThreshold: 3
successThreshold: 1

Which means by default:

10 seconds × 3 failures = 30 seconds

Your application may only get:

~30 seconds

before Kubernetes decides startup failed.

This is why slow-starting applications often need a custom startupProbe.

Common Real-World Use Cases

Java applications
ML workloads
applications loading huge caches
Python/Gunicorn services
applications waiting for database migrations

The important part:

A startup probe failure itself is NOT the issue.

The issue happens only when failures continue beyond the threshold.

readinessProbe

readinessProbe:
  httpGet:
    path: /readyz
    port: 5000
  initialDelaySeconds: 5
  periodSeconds: 10
  failureThreshold: 3

This tells Kubernetes:

Wait 5 seconds after container start.
Then check /readyz every 10 seconds.
If it fails 3 consecutive times:
remove the pod from Service traffic.

Calculation:

10 seconds × 3 failures = 30 seconds

If the application cannot respond successfully for:

30 continuous seconds

the pod becomes:

NotReady

But importantly:

The container is NOT restarted.

Traffic simply stops flowing to it temporarily.

Default Values

If not configured, Kubernetes defaults to:

initialDelaySeconds: 0
periodSeconds: 10
timeoutSeconds: 1
failureThreshold: 3
successThreshold: 1

This means Kubernetes starts checking almost immediately.

That can become dangerous for applications that:

take time to boot
warm caches
establish DB connections
initialize workers

Important Concept

A readiness failure usually means:

"Do not send traffic right now."

It does NOT mean:

"The application is dead."

This distinction is extremely important in production.

livenessProbe

livenessProbe:
  httpGet:
    path: /healthz
    port: 5000
  initialDelaySeconds: 30
  periodSeconds: 20
  failureThreshold: 3

This tells Kubernetes:

Wait 30 seconds before starting checks.
Then check /healthz every 20 seconds.
If it fails 3 consecutive times:
restart the container.

Calculation:

20 seconds × 3 failures = 60 seconds

If health checks fail continuously for:

60 seconds

Kubernetes assumes:

“The application is unhealthy or stuck.”

and restarts the container automatically.

Default Values

Kubernetes defaults:

initialDelaySeconds: 0
periodSeconds: 10
timeoutSeconds: 1
failureThreshold: 3
successThreshold: 1

Which effectively means:

10 seconds × 3 failures = 30 seconds

before restart behavior begins.

Common Mistake

Many teams configure aggressive liveness probes like:

timeoutSeconds: 1

During:

CPU spikes
GC pauses
dependency slowness
temporary latency

the application may briefly respond slowly.

This can accidentally trigger unnecessary restarts.

The Most Important Thing to Understand

Many engineers panic immediately when they see:

Readiness probe failed

or:

Liveness probe failed

But probes are designed to fail occasionally.

The real question is:

Did the failures exceed the threshold?

Because Kubernetes only takes action after repeated failures over time.

That’s why these settings matter so much:

failureThreshold
periodSeconds
timeoutSeconds
initialDelaySeconds

Together, they control:

how patient Kubernetes should be
when traffic should stop
when restarts should happen
how tolerant the system should be during spikes

Probe	What Happens on Failure?
startupProbe	Container may be killed if startup takes too long
readinessProbe	Pod stops receiving traffic
livenessProbe	Container gets restarted

Kubernetes probes are not meant to punish applications.

They are safety mechanisms.

The goal is to:

avoid sending traffic to unhealthy pods
restart stuck applications
allow slow startups safely

Once you understand probe thresholds, Kubernetes behavior suddenly becomes much easier to debug.

DEV Community

Deployment using all three Kubernetes probes

Full Example YAML

startupProbe

Default Values

Common Real-World Use Cases

readinessProbe

Default Values

Important Concept

livenessProbe

Default Values

Common Mistake

The Most Important Thing to Understand

Top comments (0)