DEV Community

Cover image for Scaling FastAPI from 180 1300 Requests/sec: What Actually Worked
Winson GR
Winson GR

Posted on

Scaling FastAPI from 180 1300 Requests/sec: What Actually Worked

Most FastAPI performance issues aren't caused by the framework - they're caused by architecture, blocking I/O, and database query patterns.

I refactored a FastAPI backend that was stuck at ~180 requests/sec with p95 latency over 4 seconds. After a series of changes, it handled ~1300 requests/sec at under 200ms p95 - on the same hardware.

No vertical scaling. No extra cloud spend. Just removing bottlenecks.


The Starting Point

The system had grown fast. Speed was prioritized over structure - until it wasn’t.

By the time performance became a problem, the backend had 14+ microservices.

In practice:

  • Auth logic duplicated across 6 services
  • Each service maintained its own DB connection pool
  • A single request triggered 4–5 internal API hops
  • Middleware inconsistently applied

The latency wasn’t coming from slow code. It was coming from the architecture.


Fix 1: Kill the Service Fragmentation

14+ repos → 4 domain-focused services:

Before After
auth, token, session identity-service
report, export, pdf jobs-service
user, profile, prefs user-service
scattered core-api

Before:

Client → core-api → auth → user → report → export
Enter fullscreen mode Exit fullscreen mode

After:

Client → core-api → identity / user / jobs
Enter fullscreen mode Exit fullscreen mode

Result: Internal hops dropped ~4 → ~1

→ ~35% latency reduction


Fix 2: The Stack Wasn't Actually Async

@app.get("/users/{user_id}")
async def get_user(user_id: int):
    result = db.execute(...)  # blocks event loop
Enter fullscreen mode Exit fullscreen mode

Async endpoint ≠ async execution.

Fix:

  • asyncpg instead of psycopg2
  • httpx instead of requests
result = await httpx.AsyncClient().get(...)
Enter fullscreen mode Exit fullscreen mode

Result: ~3x worker concurrency


Fix 3: Remove Heavy Work from Requests

Problem:

  • Emails
  • PDFs
  • Webhooks

All inside request lifecycle.

Fix:

send_email.delay(order_id)
generate_invoice.delay(order_id)
Enter fullscreen mode Exit fullscreen mode

Rule:
If user doesn’t need it before 200 OK → move it out.

Result:

800ms → 80ms endpoints


Fix 4: Fix the Database

N+1 Queries

# Before
for user_id in user_ids:
    await db.fetchrow(...)

# After
await db.fetch("SELECT ... WHERE id = ANY($1)", user_ids)
Enter fullscreen mode Exit fullscreen mode

Missing Index

CREATE INDEX idx_events_user_created
ON events(user_id, created_at DESC);
Enter fullscreen mode Exit fullscreen mode

Overfetching

Pulled only required columns.

Result:

  • Query time ↓ 60–70%
  • DB handled ~4x load

Fix 5: Cache What Doesn't Change

cached = await redis.get(key)
if cached:
    return cached

await redis.setex(key, 300, value)
Enter fullscreen mode Exit fullscreen mode

Result:
~90% reduction in DB hits


Fix 6: Runtime Tuning (Last)

  • uvloop
  • httptools
  • worker tuning

Impact: ~10–15%

Architecture fixes gave ~85% of gains.


Final Numbers

(4 vCPU / 8GB, k6 load test)

Metric Before After
RPS ~180 ~1300
p95 latency ~4200ms ~180ms
DB queries 14 2
Services 14+ 4

Production traffic:
~900–1400 req/sec depending on load


What Breaks Next

At ~1500 RPS:

  • DB connection pool saturation
  • Celery backlog
  • Redis CPU spikes

Next steps:

  • read replicas
  • queue sharding
  • rate limiting

What Actually Matters

Order matters:

  1. Architecture
  2. Async correctness
  3. Background work
  4. Database
  5. Caching
  6. Runtime tuning

Most scaling problems aren’t framework problems.

They’re architecture and DB problems.


Before You Go

If this helped, share it with one engineer hitting the same bottleneck.

🔗 LinkedIn: https://www.linkedin.com/in/winsongr/

🐦 X: https://x.com/winsongr

💻 GitHub: https://github.com/winsongr

Top comments (0)