Winson GR

Posted on Mar 17

Scaling FastAPI from 180 1300 Requests/sec: What Actually Worked

#distributedsystems #systemdesign #backend #python

Most FastAPI performance issues aren't caused by the framework - they're caused by architecture, blocking I/O, and database query patterns.

I refactored a FastAPI backend that was stuck at ~180 requests/sec with p95 latency over 4 seconds. After a series of changes, it handled ~1300 requests/sec at under 200ms p95 - on the same hardware.

No vertical scaling. No extra cloud spend. Just removing bottlenecks.

The Starting Point

The system had grown fast. Speed was prioritized over structure - until it wasn’t.

By the time performance became a problem, the backend had 14+ microservices.

In practice:

Auth logic duplicated across 6 services
Each service maintained its own DB connection pool
A single request triggered 4–5 internal API hops
Middleware inconsistently applied

The latency wasn’t coming from slow code. It was coming from the architecture.

Fix 1: Kill the Service Fragmentation

14+ repos → 4 domain-focused services:

Before	After
auth, token, session	identity-service
report, export, pdf	jobs-service
user, profile, prefs	user-service
scattered	core-api

Before:

Client → core-api → auth → user → report → export

After:

Client → core-api → identity / user / jobs

Result: Internal hops dropped ~4 → ~1

→ ~35% latency reduction

Fix 2: The Stack Wasn't Actually Async

@app.get("/users/{user_id}")
async def get_user(user_id: int):
    result = db.execute(...)  # blocks event loop

Async endpoint ≠ async execution.

Fix:

asyncpg instead of psycopg2
httpx instead of requests

result = await httpx.AsyncClient().get(...)

Result: ~3x worker concurrency

Fix 3: Remove Heavy Work from Requests

Problem:

Emails
PDFs
Webhooks

All inside request lifecycle.

Fix:

send_email.delay(order_id)
generate_invoice.delay(order_id)

Rule:
If user doesn’t need it before 200 OK → move it out.

Result:

800ms → 80ms endpoints

Fix 4: Fix the Database

N+1 Queries

# Before
for user_id in user_ids:
    await db.fetchrow(...)

# After
await db.fetch("SELECT ... WHERE id = ANY($1)", user_ids)

Missing Index

CREATE INDEX idx_events_user_created
ON events(user_id, created_at DESC);

Overfetching

Pulled only required columns.

Result:

Query time ↓ 60–70%
DB handled ~4x load

Fix 5: Cache What Doesn't Change

cached = await redis.get(key)
if cached:
    return cached

await redis.setex(key, 300, value)

Result:
~90% reduction in DB hits

Fix 6: Runtime Tuning (Last)

uvloop
httptools
worker tuning

Impact: ~10–15%

Architecture fixes gave ~85% of gains.

Final Numbers

(4 vCPU / 8GB, k6 load test)

Metric	Before	After
RPS	~180	~1300
p95 latency	~4200ms	~180ms
DB queries	14	2
Services	14+	4

Production traffic:
~900–1400 req/sec depending on load

What Breaks Next

At ~1500 RPS:

DB connection pool saturation
Celery backlog
Redis CPU spikes

Next steps:

read replicas
queue sharding
rate limiting

What Actually Matters

Order matters:

Architecture
Async correctness
Background work
Database
Caching
Runtime tuning

Most scaling problems aren’t framework problems.

They’re architecture and DB problems.

Before You Go

If this helped, share it with one engineer hitting the same bottleneck.

🔗 LinkedIn: https://www.linkedin.com/in/winsongr/

🐦 X: https://x.com/winsongr

💻 GitHub: https://github.com/winsongr

DEV Community