A pattern I keep seeing on teams that move services from REST to gRPC: the load balancer health check stays green even when the gRPC listener is completely hung.
The Setup
Most gRPC services end up with two listeners by default. One for actual gRPC traffic (HTTP/2 on a port like 50051) and one for metrics, admin endpoints, or a legacy REST compatibility layer (plain HTTP). The health check inherited from the old REST service points at the HTTP listener.
This is fine until it isn't.
What Goes Wrong
The HTTP listener can be healthy — serving prometheus metrics, responding to /health — while the gRPC listener is deadlocked, crashing, or just misconfigured. Your load balancer sees green, routes traffic, and suddenly you have a partial outage that's hard to diagnose because every monitoring dashboard says everything is fine.
Load Balancer -> HTTP listener (port 8080) -> health check: OK
gRPC listener (port 50051) -> actual traffic -> DOWN
The Fix
Use grpc_health_probe against the actual gRPC port instead of an HTTP check against the sidecar listener.
# Instead of this (HTTP health check on the wrong port)
curl http://service:8080/health
# Do this (gRPC health check on the gRPC port)
grpc_health_probe -addr=service:50051
If you can't run the probe binary directly, the alternative is to consolidate to a single HTTP/2 listener that handles both gRPC traffic and health checks. This removes the footgun entirely.
Why This Bites Teams
The issue is architectural drift. The service was built with two listeners, someone wrote a health check for one, and that check got adopted into load balancer configs without anyone auditing whether it actually validated the right thing. The gRPC service looks healthy because it's healthy on the port nobody routes production traffic through.
Health checks that validate your observability stack but not your actual service contract are more common than you'd think. When in doubt, health-check the port that handles your production traffic.
-- Schiff Heimlich
Top comments (0)