DEV Community

Cover image for FastAPI at 1M+ users: the patterns that actually matter
Tufail Khan
Tufail Khan

Posted on • Originally published at tufail.dev

FastAPI at 1M+ users: the patterns that actually matter

FastAPI is the default Python web framework in 2026 — 38% of Python teams ship on it, up from 29% a year ago. That means a lot of greenfield projects are making the same early mistakes.

This post is what I wish I'd known before scaling Savyour (Pakistan's first cashback platform, 1M+ users, 300+ merchant integrations) from 50 RPS to 3,000+ RPS on FastAPI.

Everything below is drawn from production. No "hello world" demos.

1. Know your async boundaries

FastAPI supports both def and async def endpoints. The framework is smart enough to run sync routes in a threadpool — but your code may not be.

The failure mode: an async def endpoint that calls a blocking library (say, requests instead of httpx). The sync call holds the event loop, everything queues behind it, and your p99 latency goes vertical.

Rule: if the function is async def, every IO operation inside it must be awaitable. Use httpx.AsyncClient, asyncpg, aioboto3, redis.asyncio.

When you must call a sync library, wrap it:

from fastapi.concurrency import run_in_threadpool

@app.get("/report")
async def generate_report():
    # sync pandas code — don't block the loop
    result = await run_in_threadpool(expensive_sync_function)
    return result
Enter fullscreen mode Exit fullscreen mode

2. Connection pools are not optional

Naive async code opens a new database connection per request. At 500 RPS with a 50ms query, that's 25,000 connections fighting your Postgres instance. Postgres caps out around 200-500.

Fix: use a single pool per worker, with tuned sizing:

# database.py
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.orm import sessionmaker

engine = create_async_engine(
    DATABASE_URL,
    pool_size=20,           # steady-state per worker
    max_overflow=10,        # burst tolerance
    pool_pre_ping=True,     # detect dead connections
    pool_recycle=1800,      # rotate every 30min
)

AsyncSessionLocal = sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)

async def get_db():
    async with AsyncSessionLocal() as session:
        yield session
Enter fullscreen mode Exit fullscreen mode

For multi-worker deployments (Uvicorn --workers 4), multiply by worker count. If your Postgres caps at 200 connections, 4 workers × 30 max = 120 is safe. Monitor pg_stat_activity in prod.

3. Push heavy work to background queues

The endpoint that made Savyour go down in month two: a synchronous product-sync that iterated through 50K affiliate offers per merchant. Five merchants syncing at once = 250K records in-request = timeouts cascading.

The fix was simple but non-obvious to a team new to async: never do heavy work in the request cycle.

from arq import create_pool
from arq.connections import RedisSettings

@app.post("/sync/{merchant_id}")
async def trigger_sync(merchant_id: int, pool=Depends(get_arq_pool)):
    job = await pool.enqueue_job("sync_merchant", merchant_id)
    return {"job_id": job.job_id, "status": "queued"}
Enter fullscreen mode Exit fullscreen mode

ARQ, Celery, or Dramatiq — pick one. The worker fleet scales independently of the API fleet. Requests return in milliseconds. Monitoring stays sane.

4. Pydantic v2 is 5-50× faster — use it

If you're still on Pydantic v1, migrate. The v2 rewrite in Rust dropped our request validation overhead from ~8ms to ~0.5ms per request. At 3,000 RPS that's a full CPU core back.

Gotchas we hit:

  • Configmodel_config (nested dict)
  • .dict().model_dump()
  • validatorfield_validator, root_validatormodel_validator

Use bump-pydantic for the mechanical parts. The semantic changes (validator signatures) need human review.

5. Middleware for observability, not magic

We run three middleware layers in production. In order:

# main.py
app = FastAPI()

# 1. Request ID — every log line traces back
app.add_middleware(RequestIDMiddleware)

# 2. Timing — p50/p95/p99 per route
app.add_middleware(TimingMiddleware)

# 3. Structured logging — JSON out to CloudWatch
app.add_middleware(LoggingMiddleware)

# CORS goes OUTERMOST so OPTIONS requests skip everything
app.add_middleware(CORSMiddleware, allow_origins=FRONTEND_ORIGINS)
Enter fullscreen mode Exit fullscreen mode

Avoid: auto-magic middleware that wraps your handlers with decorators you can't inspect. When things break at 3 AM, you need to grep the source and understand what's happening. Explicit > clever.

6. Health checks, liveness, readiness

Three distinct endpoints. Don't collapse them.

@app.get("/healthz")  # is the process up?
async def health():
    return {"status": "ok"}

@app.get("/readyz")  # can we serve traffic?
async def ready(db=Depends(get_db), redis=Depends(get_redis)):
    await db.execute("SELECT 1")
    await redis.ping()
    return {"status": "ready"}

@app.get("/livez")  # should kubelet restart us?
async def live():
    return {"status": "alive"}
Enter fullscreen mode Exit fullscreen mode

Kubernetes (or ECS, or Fargate) uses these to make restart decisions. A failing dependency should make readyz fail so the LB stops sending traffic — but shouldn't make livez fail and trigger a restart loop.

7. One project structure to rule them all

After shipping a dozen FastAPI services, this is the structure I reach for:

app/
├── main.py            # FastAPI app, middleware, lifespan
├── config.py          # pydantic-settings, env-driven
├── db.py              # engine + session factory
├── dependencies.py    # shared Depends() providers
├── routers/
│   ├── customers.py
│   ├── orders.py
│   └── webhooks.py
├── schemas/           # pydantic request/response models
├── models/            # SQLAlchemy ORM
├── services/          # business logic, pure-ish
├── workers/           # ARQ/Celery task definitions
└── tests/
Enter fullscreen mode Exit fullscreen mode

The key discipline: routers call services, services call models, models don't reach back up. Break that rule and tests get painful fast.

What I'd skip

Things I used to reach for that I don't anymore:

  • Starlette middleware for auth. Use FastAPI Depends() for auth — it composes cleanly with route permissions.
  • Custom exception handlers for every error. One global handler that maps exceptions → HTTP codes is enough for 95% of services.
  • Over-engineered response models for internal APIs. dict returns are fine for handlers only your own code calls.

The meta-point

FastAPI's documentation is aggressively good — better than most frameworks' books. Read it twice before inventing patterns. Most of the hard-won lessons above are implicit in the docs; I just didn't slow down enough to absorb them the first time.

Top comments (0)