DEV Community

Cover image for Smarter Health Checks for Zero-Downtime Deployments
Nick Taylor Subscriber for Pomerium

Posted on • Originally published at pomerium.com

Smarter Health Checks for Zero-Downtime Deployments

TL;DR

The latest Pomerium release introduces fine-grained and context-aware health checks for Kubernetes, AWS ECS, and systemd. These checks confirm that your routing and policy-enforcement layers are fully ready before handling traffic, giving operators reliable zero-downtime upgrades.

Read the full release notes here

Why Health Checks Matter

Without proper health checks, services can often start before all dependencies are fully ready. Readiness checks should not only confirm network connectivity, but whether you are prepared to handle requests.

The new health checks ensure that all critical components are initialized before traffic is accepted. This ensures smarter startups and improved reliability in automated environments like Kubernetes.

This ensures:

  • No more “healthy” pods denying requests on startup
  • Graceful shutdowns that wait for all active connections to drain
  • Smooth autoscaling and rolling upgrades with zero downtime

Kubernetes: Before and After

Previous behavior

When a new replica started in Kubernetes, it was marked ready immediately. The proxy could receive requests before configuration and policy sync completed. Some requests were denied until initialization finished.

Previous Health checks behavior

New behavior

Each replica now reports readiness only after it completes startup tasks and verifies configuration sync. Once the new replica signals it is ready, Kubernetes begins draining connections from the old one.

This sequence enables rolling upgrades without downtime or failed requests.

after Health checks behavior

ECS and systemd Support

The new pomerium health command extends readiness logic to AWS ECS and systemd environments.

You can use it to:

  • Integrate with ECS deployment health checks
  • Manage service restarts safely under systemd
  • Maintain uptime during scaling or repaving events

Health reporting works consistently across orchestrators with no special configuration required.

Example Health Check Output

Before Pomerium handles traffic, it confirms that each core service is running:

{
  "authenticate.service": { "status": "RUNNING" },
  "authorize.service": { "status": "RUNNING" },
  "config.databroker.build": { "status": "RUNNING" },
  "databroker.sync.initial": { "status": "RUNNING" },
  "envoy.server": { "status": "RUNNING" },
  "proxy.service": { "status": "RUNNING" },
  "storage.backend": {
    "status": "RUNNING",
    "attributes": [{ "Key": "backend", "Value": "in-memory" }]
  },
  "xds.cluster": { "status": "RUNNING" },
  "xds.listener": { "status": "RUNNING" },
  "xds.route-configuration": { "status": "RUNNING" }
}
Enter fullscreen mode Exit fullscreen mode

Only when all components report RUNNING does Pomerium begin serving requests.

Why You Should Care

These new health checks make Pomerium safer and more predictable during updates and scaling events.

They ensure:

  • Each replica starts cleanly
  • Active connections drain fully
  • Autoscaling and rollouts complete without downtime

Together with improved metrics and file-based databroker storage, this makes Pomerium self-healing and easier to operate in production.

Get Started

Upgrade to Pomerium v0.31 to enable the new health checks automatically in your existing deployments.

For detailed examples and configuration guidance, visit the Health Checks documentation.

Top comments (0)