FastAPI is the default Python web framework in 2026 — 38% of Python teams ship on it, up from 29% a year ago. That means a lot of greenfield projects are making the same early mistakes.
This post is what I wish I'd known before scaling Savyour (Pakistan's first cashback platform, 1M+ users, 300+ merchant integrations) from 50 RPS to 3,000+ RPS on FastAPI.
Everything below is drawn from production. No "hello world" demos.
1. Know your async boundaries
FastAPI supports both def and async def endpoints. The framework is smart enough to run sync routes in a threadpool — but your code may not be.
The failure mode: an async def endpoint that calls a blocking library (say, requests instead of httpx). The sync call holds the event loop, everything queues behind it, and your p99 latency goes vertical.
Rule: if the function is async def, every IO operation inside it must be awaitable. Use httpx.AsyncClient, asyncpg, aioboto3, redis.asyncio.
When you must call a sync library, wrap it:
from fastapi.concurrency import run_in_threadpool
@app.get("/report")
async def generate_report():
# sync pandas code — don't block the loop
result = await run_in_threadpool(expensive_sync_function)
return result
2. Connection pools are not optional
Naive async code opens a new database connection per request. At 500 RPS with a 50ms query, that's 25,000 connections fighting your Postgres instance. Postgres caps out around 200-500.
Fix: use a single pool per worker, with tuned sizing:
# database.py
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.orm import sessionmaker
engine = create_async_engine(
DATABASE_URL,
pool_size=20, # steady-state per worker
max_overflow=10, # burst tolerance
pool_pre_ping=True, # detect dead connections
pool_recycle=1800, # rotate every 30min
)
AsyncSessionLocal = sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)
async def get_db():
async with AsyncSessionLocal() as session:
yield session
For multi-worker deployments (Uvicorn --workers 4), multiply by worker count. If your Postgres caps at 200 connections, 4 workers × 30 max = 120 is safe. Monitor pg_stat_activity in prod.
3. Push heavy work to background queues
The endpoint that made Savyour go down in month two: a synchronous product-sync that iterated through 50K affiliate offers per merchant. Five merchants syncing at once = 250K records in-request = timeouts cascading.
The fix was simple but non-obvious to a team new to async: never do heavy work in the request cycle.
from arq import create_pool
from arq.connections import RedisSettings
@app.post("/sync/{merchant_id}")
async def trigger_sync(merchant_id: int, pool=Depends(get_arq_pool)):
job = await pool.enqueue_job("sync_merchant", merchant_id)
return {"job_id": job.job_id, "status": "queued"}
ARQ, Celery, or Dramatiq — pick one. The worker fleet scales independently of the API fleet. Requests return in milliseconds. Monitoring stays sane.
4. Pydantic v2 is 5-50× faster — use it
If you're still on Pydantic v1, migrate. The v2 rewrite in Rust dropped our request validation overhead from ~8ms to ~0.5ms per request. At 3,000 RPS that's a full CPU core back.
Gotchas we hit:
-
Config→model_config(nested dict) -
.dict()→.model_dump() -
validator→field_validator,root_validator→model_validator
Use bump-pydantic for the mechanical parts. The semantic changes (validator signatures) need human review.
5. Middleware for observability, not magic
We run three middleware layers in production. In order:
# main.py
app = FastAPI()
# 1. Request ID — every log line traces back
app.add_middleware(RequestIDMiddleware)
# 2. Timing — p50/p95/p99 per route
app.add_middleware(TimingMiddleware)
# 3. Structured logging — JSON out to CloudWatch
app.add_middleware(LoggingMiddleware)
# CORS goes OUTERMOST so OPTIONS requests skip everything
app.add_middleware(CORSMiddleware, allow_origins=FRONTEND_ORIGINS)
Avoid: auto-magic middleware that wraps your handlers with decorators you can't inspect. When things break at 3 AM, you need to grep the source and understand what's happening. Explicit > clever.
6. Health checks, liveness, readiness
Three distinct endpoints. Don't collapse them.
@app.get("/healthz") # is the process up?
async def health():
return {"status": "ok"}
@app.get("/readyz") # can we serve traffic?
async def ready(db=Depends(get_db), redis=Depends(get_redis)):
await db.execute("SELECT 1")
await redis.ping()
return {"status": "ready"}
@app.get("/livez") # should kubelet restart us?
async def live():
return {"status": "alive"}
Kubernetes (or ECS, or Fargate) uses these to make restart decisions. A failing dependency should make readyz fail so the LB stops sending traffic — but shouldn't make livez fail and trigger a restart loop.
7. One project structure to rule them all
After shipping a dozen FastAPI services, this is the structure I reach for:
app/
├── main.py # FastAPI app, middleware, lifespan
├── config.py # pydantic-settings, env-driven
├── db.py # engine + session factory
├── dependencies.py # shared Depends() providers
├── routers/
│ ├── customers.py
│ ├── orders.py
│ └── webhooks.py
├── schemas/ # pydantic request/response models
├── models/ # SQLAlchemy ORM
├── services/ # business logic, pure-ish
├── workers/ # ARQ/Celery task definitions
└── tests/
The key discipline: routers call services, services call models, models don't reach back up. Break that rule and tests get painful fast.
What I'd skip
Things I used to reach for that I don't anymore:
-
Starlette middleware for auth. Use FastAPI
Depends()for auth — it composes cleanly with route permissions. - Custom exception handlers for every error. One global handler that maps exceptions → HTTP codes is enough for 95% of services.
-
Over-engineered response models for internal APIs.
dictreturns are fine for handlers only your own code calls.
The meta-point
FastAPI's documentation is aggressively good — better than most frameworks' books. Read it twice before inventing patterns. Most of the hard-won lessons above are implicit in the docs; I just didn't slow down enough to absorb them the first time.
Top comments (0)