LinkedIn Draft — Workflow (2026-01-13)
{{opener}}
Kubernetes rollouts: promote on SLOs, not on “pods are Ready”
Readiness is a local signal. Production impact is global. Real rollouts need promotion gates that track user-facing health.
What usually bites later:
- A rollout can be 100% Ready while P95 latency and error-rate spike (bad cache warmup, noisy neighbor, DB pressure).
- HPA reacts slower than a fast rollout; you ship overload before autoscaling catches up.
- Canary gets “stuck green” because your metrics are too coarse (no labels/slices), so you miss blast radius.
My default rule:
Promote only when your canary holds the SLO slice you care about (error-rate + latency) for a fixed window — otherwise auto-rollback.
When I’m sanity-checking this, I usually do:
- Use Argo Rollouts / Flagger with Prometheus metrics as gates (error-rate, latency, saturation).
- Alert on canary vs baseline deltas, not absolute thresholds (reduces noise, catches regressions).
Deep dive (stable link): https://neeraja-portfolio-v1.vercel.app/workflows/kubernetes-rollouts-promote-on-slos-not-on-pods-are-ready
{{closer}}
Top comments (0)