Why Healthy P99 Latency Can Hide Async Runtime Collapse in Python

#python #fastapi #devops #observability

Most observability dashboards focus heavily on request-facing metrics:

latency
throughput
error rate
CPU and memory usage

Those metrics are important, but while stress-testing async FastAPI services under concurrent load, I noticed they were not always enough to explain what the runtime was actually experiencing internally.

In one test setup, requests were still returning 200 OK, P99 latency had increased but was still within survivable limits, and CPU usage looked fairly normal.

At the same time, the asyncio event loop was already struggling badly.

Other endpoints became inconsistent, executor queues started backing up, and event-loop lag increased into multi-second territory even before the service looked obviously unhealthy from the outside.

In several runs, event-loop lag exceeded multiple seconds while request latency was still low enough that the service initially appeared operational from the outside.

In some runs, unrelated lightweight endpoints stalled behind a single blocking request even though system-wide CPU usage was not saturated.

The issue became easier to reproduce when synchronous work leaked into async request paths.

Simple examples include:

blocking database clients
synchronous SDKs
legacy REST calls using requests
filesystem operations
accidental time.sleep() calls
overloaded threadpool executors

Even a small blocking section inside an async route can create scheduler starvation under enough concurrency.

Example:

import time
from fastapi import FastAPI

app = FastAPI()

@app.get("/agent")
async def run_agent():
    time.sleep(5)
    return {"status": "ok"}

Under load this starts affecting unrelated coroutines, queue behavior, scheduler fairness, and request consistency across the service.

One thing that stood out during testing was how differently runtime metrics behaved compared to HTTP-facing metrics.

Request latency degraded gradually, but event-loop lag increased much more aggressively once scheduler pressure crossed a certain point.

Event-loop lag increasing sharply while outward-facing request metrics remained comparatively survivable.

To explore this more systematically, I built a small runtime observability lab using:

FastAPI
Prometheus
Grafana
Docker Compose

The goal was simply to reproduce different forms of async runtime degradation and observe which telemetry signals changed first.

Minimal async runtime observability lab used for reproducing scheduler starvation and queue amplification scenarios.

The setup intentionally introduced:

blocking synchronous execution
executor saturation
queue amplification
event-loop starvation

while exposing internal runtime telemetry through Prometheus.

The most useful telemetry signals ended up being event-loop lag, blocking duration, executor queue pressure, backlog growth, and concurrent saturation behavior.

Those signals exposed runtime instability much earlier than HTTP metrics alone.

I also built a small CLI tool called async-runtime-auditor to evaluate these metrics directly from Prometheus during testing.

The idea was not to build another monitoring platform, but to create lightweight runtime validation checks for async Python services inside CI/CD workflows.

The tool evaluates runtime metrics against deterministic thresholds and can fail execution when runtime degradation becomes severe enough.

Example:

async-auditor \
  --config metrics.yaml \
  --target http://localhost:9090 \
  --fail-on-critical

Example output:

ASYNC RUNTIME AUDITOR

Runtime Status: DEGRADED

Findings:
- Event-loop starvation detected
- Executor queue amplification detected
- Concurrent saturation detected

One thing this testing made clear is that async systems can begin degrading internally well before traditional dashboards clearly show it.

Request metrics tell you how the API behaves externally.

Runtime telemetry tells you how the scheduler behaves while the API is still functioning.

For async Python services, both perspectives matter.

The main lesson from this testing was that scheduler health and request health are not always the same thing, especially in heavily concurrent async systems.