DEV Community

Neeraja Khanapure
Neeraja Khanapure

Posted on

Workflow Deep Dive

LinkedIn Draft — Workflow (2026-01-13)

{{opener}}

Kubernetes rollouts: promote on SLOs, not on “pods are Ready”

Readiness is a local signal. Production impact is global. Real rollouts need promotion gates that track user-facing health.

What usually bites later:

  • A rollout can be 100% Ready while P95 latency and error-rate spike (bad cache warmup, noisy neighbor, DB pressure).
  • HPA reacts slower than a fast rollout; you ship overload before autoscaling catches up.
  • Canary gets “stuck green” because your metrics are too coarse (no labels/slices), so you miss blast radius.

My default rule:
Promote only when your canary holds the SLO slice you care about (error-rate + latency) for a fixed window — otherwise auto-rollback.

When I’m sanity-checking this, I usually do:

  • Use Argo Rollouts / Flagger with Prometheus metrics as gates (error-rate, latency, saturation).
  • Alert on canary vs baseline deltas, not absolute thresholds (reduces noise, catches regressions).

Deep dive (stable link): https://neeraja-portfolio-v1.vercel.app/workflows/kubernetes-rollouts-promote-on-slos-not-on-pods-are-ready

{{closer}}

kubernetes #reliability #devops #sre

Top comments (0)