<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Winson GR</title>
    <description>The latest articles on DEV Community by Winson GR (@winsongr).</description>
    <link>https://dev.to/winsongr</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1671260%2Fb0a3511a-ebb0-4386-aa48-309d8bc92cd7.png</url>
      <title>DEV Community: Winson GR</title>
      <link>https://dev.to/winsongr</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/winsongr"/>
    <language>en</language>
    <item>
      <title>Scaling FastAPI from 180 1300 Requests/sec: What Actually Worked</title>
      <dc:creator>Winson GR</dc:creator>
      <pubDate>Tue, 17 Mar 2026 05:44:06 +0000</pubDate>
      <link>https://dev.to/winsongr/scaling-fastapi-from-180-1300-requestssec-what-actually-worked-10n9</link>
      <guid>https://dev.to/winsongr/scaling-fastapi-from-180-1300-requestssec-what-actually-worked-10n9</guid>
      <description>&lt;p&gt;Most FastAPI performance issues aren't caused by the framework - they're caused by architecture, blocking I/O, and database query patterns.&lt;/p&gt;

&lt;p&gt;I refactored a FastAPI backend that was stuck at ~180 requests/sec with p95 latency over 4 seconds. After a series of changes, it handled ~1300 requests/sec at under 200ms p95 - on the same hardware.&lt;/p&gt;

&lt;p&gt;No vertical scaling. No extra cloud spend. Just removing bottlenecks.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Starting Point
&lt;/h2&gt;

&lt;p&gt;The system had grown fast. Speed was prioritized over structure - until it wasn’t.&lt;/p&gt;

&lt;p&gt;By the time performance became a problem, the backend had &lt;strong&gt;14+ microservices&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In practice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Auth logic duplicated across 6 services
&lt;/li&gt;
&lt;li&gt;Each service maintained its own DB connection pool
&lt;/li&gt;
&lt;li&gt;A single request triggered &lt;strong&gt;4–5 internal API hops&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Middleware inconsistently applied
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The latency wasn’t coming from slow code. It was coming from the architecture.&lt;/p&gt;




&lt;h2&gt;
  
  
  Fix 1: Kill the Service Fragmentation
&lt;/h2&gt;

&lt;p&gt;14+ repos → 4 domain-focused services:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;auth, token, session&lt;/td&gt;
&lt;td&gt;identity-service&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;report, export, pdf&lt;/td&gt;
&lt;td&gt;jobs-service&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;user, profile, prefs&lt;/td&gt;
&lt;td&gt;user-service&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;scattered&lt;/td&gt;
&lt;td&gt;core-api&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Before:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client → core-api → auth → user → report → export
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client → core-api → identity / user / jobs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Internal hops dropped ~4 → ~1&lt;br&gt;&lt;br&gt;
→ ~35% latency reduction&lt;/p&gt;


&lt;h2&gt;
  
  
  Fix 2: The Stack Wasn't Actually Async
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/users/{user_id}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;  &lt;span class="c1"&gt;# blocks event loop
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Async endpoint ≠ async execution.&lt;/p&gt;

&lt;p&gt;Fix:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;asyncpg&lt;/code&gt; instead of &lt;code&gt;psycopg2&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;httpx&lt;/code&gt; instead of &lt;code&gt;requests&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;AsyncClient&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; ~3x worker concurrency&lt;/p&gt;


&lt;h2&gt;
  
  
  Fix 3: Remove Heavy Work from Requests
&lt;/h2&gt;

&lt;p&gt;Problem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Emails
&lt;/li&gt;
&lt;li&gt;PDFs
&lt;/li&gt;
&lt;li&gt;Webhooks
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All inside request lifecycle.&lt;/p&gt;

&lt;p&gt;Fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;send_email&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;generate_invoice&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Rule:&lt;/strong&gt;&lt;br&gt;
If user doesn’t need it before &lt;code&gt;200 OK&lt;/code&gt; → move it out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
800ms → 80ms endpoints&lt;/p&gt;




&lt;h2&gt;
  
  
  Fix 4: Fix the Database
&lt;/h2&gt;

&lt;h3&gt;
  
  
  N+1 Queries
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Before
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;user_ids&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchrow&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;

&lt;span class="c1"&gt;# After
&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT ... WHERE id = ANY($1)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_ids&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Missing Index
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_events_user_created&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Overfetching
&lt;/h3&gt;

&lt;p&gt;Pulled only required columns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Query time ↓ 60–70%
&lt;/li&gt;
&lt;li&gt;DB handled ~4x load
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Fix 5: Cache What Doesn't Change
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;cached&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cached&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;br&gt;
~90% reduction in DB hits&lt;/p&gt;




&lt;h2&gt;
  
  
  Fix 6: Runtime Tuning (Last)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;uvloop&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;httptools&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;worker tuning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Impact: ~10–15%&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture fixes gave ~85% of gains.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Numbers
&lt;/h2&gt;

&lt;p&gt;(4 vCPU / 8GB, k6 load test)&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;RPS&lt;/td&gt;
&lt;td&gt;~180&lt;/td&gt;
&lt;td&gt;~1300&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;p95 latency&lt;/td&gt;
&lt;td&gt;~4200ms&lt;/td&gt;
&lt;td&gt;~180ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DB queries&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Services&lt;/td&gt;
&lt;td&gt;14+&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Production traffic:&lt;br&gt;
&lt;strong&gt;~900–1400 req/sec depending on load&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What Breaks Next
&lt;/h2&gt;

&lt;p&gt;At ~1500 RPS:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DB connection pool saturation
&lt;/li&gt;
&lt;li&gt;Celery backlog
&lt;/li&gt;
&lt;li&gt;Redis CPU spikes
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Next steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;read replicas
&lt;/li&gt;
&lt;li&gt;queue sharding
&lt;/li&gt;
&lt;li&gt;rate limiting
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What Actually Matters
&lt;/h2&gt;

&lt;p&gt;Order matters:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Architecture
&lt;/li&gt;
&lt;li&gt;Async correctness
&lt;/li&gt;
&lt;li&gt;Background work
&lt;/li&gt;
&lt;li&gt;Database
&lt;/li&gt;
&lt;li&gt;Caching
&lt;/li&gt;
&lt;li&gt;Runtime tuning
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most scaling problems aren’t framework problems.&lt;/p&gt;

&lt;p&gt;They’re architecture and DB problems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Before You Go
&lt;/h2&gt;

&lt;p&gt;If this helped, share it with one engineer hitting the same bottleneck.&lt;/p&gt;

&lt;p&gt;🔗 LinkedIn: &lt;a href="https://www.linkedin.com/in/winsongr/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/winsongr/&lt;/a&gt;&lt;br&gt;&lt;br&gt;
🐦 X: &lt;a href="https://x.com/winsongr" rel="noopener noreferrer"&gt;https://x.com/winsongr&lt;/a&gt;&lt;br&gt;&lt;br&gt;
💻 GitHub: &lt;a href="https://github.com/winsongr" rel="noopener noreferrer"&gt;https://github.com/winsongr&lt;/a&gt;&lt;/p&gt;

</description>
      <category>distributedsystems</category>
      <category>systemdesign</category>
      <category>backend</category>
      <category>python</category>
    </item>
  </channel>
</rss>
