<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Priyanshu Kumar</title>
    <description>The latest articles on DEV Community by Priyanshu Kumar (@priyanshu-systems).</description>
    <link>https://dev.to/priyanshu-systems</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3936181%2Fcbb2718d-1df8-49d4-bf69-06c54e055b0a.png</url>
      <title>DEV Community: Priyanshu Kumar</title>
      <link>https://dev.to/priyanshu-systems</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/priyanshu-systems"/>
    <language>en</language>
    <item>
      <title>Why Healthy P99 Latency Can Hide Async Runtime Collapse in Python</title>
      <dc:creator>Priyanshu Kumar</dc:creator>
      <pubDate>Sun, 17 May 2026 13:14:10 +0000</pubDate>
      <link>https://dev.to/priyanshu-systems/why-healthy-p99-latency-can-hide-async-runtime-collapse-in-python-1ibm</link>
      <guid>https://dev.to/priyanshu-systems/why-healthy-p99-latency-can-hide-async-runtime-collapse-in-python-1ibm</guid>
      <description>&lt;p&gt;Most observability dashboards focus heavily on request-facing metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;latency&lt;/li&gt;
&lt;li&gt;throughput&lt;/li&gt;
&lt;li&gt;error rate&lt;/li&gt;
&lt;li&gt;CPU and memory usage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those metrics are important, but while stress-testing async FastAPI services under concurrent load, I noticed they were not always enough to explain what the runtime was actually experiencing internally.&lt;/p&gt;

&lt;p&gt;In one test setup, requests were still returning &lt;code&gt;200 OK&lt;/code&gt;, P99 latency had increased but was still within survivable limits, and CPU usage looked fairly normal.&lt;/p&gt;

&lt;p&gt;At the same time, the asyncio event loop was already struggling badly.&lt;/p&gt;

&lt;p&gt;Other endpoints became inconsistent, executor queues started backing up, and event-loop lag increased into multi-second territory even before the service looked obviously unhealthy from the outside.&lt;/p&gt;

&lt;p&gt;In several runs, event-loop lag exceeded multiple seconds while request latency was still low enough that the service initially appeared operational from the outside.&lt;/p&gt;

&lt;p&gt;In some runs, unrelated lightweight endpoints stalled behind a single blocking request even though system-wide CPU usage was not saturated.&lt;/p&gt;

&lt;p&gt;The issue became easier to reproduce when synchronous work leaked into async request paths.&lt;/p&gt;

&lt;p&gt;Simple examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;blocking database clients&lt;/li&gt;
&lt;li&gt;synchronous SDKs&lt;/li&gt;
&lt;li&gt;legacy REST calls using &lt;code&gt;requests&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;filesystem operations&lt;/li&gt;
&lt;li&gt;accidental &lt;code&gt;time.sleep()&lt;/code&gt; calls&lt;/li&gt;
&lt;li&gt;overloaded threadpool executors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even a small blocking section inside an async route can create scheduler starvation under enough concurrency.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Under load this starts affecting unrelated coroutines, queue behavior, scheduler fairness, and request consistency across the service.&lt;/p&gt;

&lt;p&gt;One thing that stood out during testing was how differently runtime metrics behaved compared to HTTP-facing metrics.&lt;/p&gt;

&lt;p&gt;Request latency degraded gradually, but event-loop lag increased much more aggressively once scheduler pressure crossed a certain point.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F12fx55axsuhsz1af557p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F12fx55axsuhsz1af557p.png" alt="Event-loop lag dashboard during async runtime degradation" width="800" height="369"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Event-loop lag increasing sharply while outward-facing request metrics remained comparatively survivable.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;To explore this more systematically, I built a small runtime observability lab using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;FastAPI&lt;/li&gt;
&lt;li&gt;Prometheus&lt;/li&gt;
&lt;li&gt;Grafana&lt;/li&gt;
&lt;li&gt;Docker Compose&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal was simply to reproduce different forms of async runtime degradation and observe which telemetry signals changed first.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft8xn7zdn7a16vw5ndvbw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft8xn7zdn7a16vw5ndvbw.png" alt="Async runtime observability lab architecture" width="800" height="525"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Minimal async runtime observability lab used for reproducing scheduler starvation and queue amplification scenarios.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The setup intentionally introduced:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;blocking synchronous execution&lt;/li&gt;
&lt;li&gt;executor saturation&lt;/li&gt;
&lt;li&gt;queue amplification&lt;/li&gt;
&lt;li&gt;event-loop starvation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;while exposing internal runtime telemetry through Prometheus.&lt;/p&gt;

&lt;p&gt;The most useful telemetry signals ended up being event-loop lag, blocking duration, executor queue pressure, backlog growth, and concurrent saturation behavior.&lt;/p&gt;

&lt;p&gt;Those signals exposed runtime instability much earlier than HTTP metrics alone.&lt;/p&gt;

&lt;p&gt;I also built a small CLI tool called &lt;code&gt;async-runtime-auditor&lt;/code&gt; to evaluate these metrics directly from Prometheus during testing.&lt;/p&gt;

&lt;p&gt;The idea was not to build another monitoring platform, but to create lightweight runtime validation checks for async Python services inside CI/CD workflows.&lt;/p&gt;

&lt;p&gt;The tool evaluates runtime metrics against deterministic thresholds and can fail execution when runtime degradation becomes severe enough.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;async-auditor &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--config&lt;/span&gt; metrics.yaml &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--target&lt;/span&gt; http://localhost:9090 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--fail-on-critical&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ASYNC RUNTIME AUDITOR

Runtime Status: DEGRADED

Findings:
- Event-loop starvation detected
- Executor queue amplification detected
- Concurrent saturation detected
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One thing this testing made clear is that async systems can begin degrading internally well before traditional dashboards clearly show it.&lt;/p&gt;

&lt;p&gt;Request metrics tell you how the API behaves externally.&lt;/p&gt;

&lt;p&gt;Runtime telemetry tells you how the scheduler behaves while the API is still functioning.&lt;/p&gt;

&lt;p&gt;For async Python services, both perspectives matter.&lt;/p&gt;

&lt;p&gt;The main lesson from this testing was that scheduler health and request health are not always the same thing, especially in heavily concurrent async systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Repositories
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Async Runtime Auditor
&lt;/h3&gt;

&lt;p&gt;CI/CD-oriented runtime degradation checks for async Python systems:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/priyanshuphenomenal007/async-runtime-auditor" rel="noopener noreferrer"&gt;async-runtime-auditor&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Async Runtime Health Lab
&lt;/h3&gt;

&lt;p&gt;FastAPI + Prometheus + Grafana environment for reproducing async runtime degradation:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/priyanshuphenomenal007/async-runtime-health-auditor" rel="noopener noreferrer"&gt;async-runtime-health-auditor&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>fastapi</category>
      <category>devops</category>
      <category>observability</category>
    </item>
  </channel>
</rss>
