DEV Community

Cover image for FastAPI at Lightning Speed ⚡: 10 Full-Stack Optimization Tips
Leapcell
Leapcell

Posted on

FastAPI at Lightning Speed ⚡: 10 Full-Stack Optimization Tips

Leapcell: The Best of Serverless Web Hosting

10 FastAPI Performance Optimization Tips: End-to-End Speedup from Code to Deployment

FastAPI has become one of the preferred frameworks for Python API development, thanks to its support for asynchronous operations, automatic documentation, and strong type validation. However, in high-concurrency scenarios, unoptimized services may suffer from increased latency and decreased throughput. This article compiles 10 practical optimization solutions, each including implementation steps and design principles, to help you maximize FastAPI's performance potential.

1. Prioritize async/await to Avoid Wasting Asynchronous Advantages

How to implement: Use asynchronous syntax for view functions, dependencies, and database operations, and pair them with asynchronous libraries such as aiohttp (for HTTP requests) and sqlalchemy.ext.asyncio (for databases):

from fastapi import FastAPI
import aiohttp

app = FastAPI()

@app.get("/async-data")
async def get_async_data():
    async with aiohttp.ClientSession() as session:
        async with session.get("https://api.example.com/data") as resp:
            return await resp.json()  # Asynchronous suspension without blocking the event loop
Enter fullscreen mode Exit fullscreen mode

Design principle: FastAPI is based on the ASGI protocol, with an event loop at its core. Synchronous functions (defined with def) will monopolize the event loop thread. For example, while waiting for a database response, the CPU remains completely idle but cannot process other requests. With async/await, when an I/O operation is suspended, the event loop can schedule other tasks, increasing CPU utilization by 3 to 5 times.

2. Reuse Dependency Instances to Reduce Reinitialization Overhead

How to implement: For stateless dependencies like database engines and configuration objects, cache instances using lru_cache or the singleton pattern:

from fastapi import Depends
from functools import lru_cache
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine

@lru_cache(maxsize=1)  # Create only 1 engine instance for global reuse
def get_engine():
    return create_async_engine("postgresql+asyncpg://user:pass@db:5432/db")

async def get_db(engine=Depends(get_engine)):
    async with AsyncSession(engine) as session:
        yield session
Enter fullscreen mode Exit fullscreen mode

Design principle: By default, FastAPI creates a new dependency instance for each request. However, initializing components like database engines and HTTP clients (e.g., establishing connection pools) consumes time and resources. Caching instances can reduce initialization overhead by over 90% while preventing excessive database pressure caused by the over-creation of connection pools.

3. Simplify Pydantic Models to Reduce Validation Costs

How to implement:

  1. Retain only fields necessary for the API;
  2. Use exclude_unset to reduce serialized data;
  3. Use typing instead of Pydantic for simple scenarios:
from pydantic import BaseModel

class UserResponse(BaseModel):
    id: int
    name: str  # Remove unused fields like "created_at_timestamp" for the frontend

@app.get("/users/{user_id}", response_model=UserResponse)
async def get_user(user_id: int, db=Depends(get_db)):
    user = await db.get(User, user_id)
    return user.dict(exclude_unset=True)  # Return only non-default values to reduce serialization time
Enter fullscreen mode Exit fullscreen mode

Design principle: Pydantic implements type validation through reflection. The more fields a model has and the deeper the nesting, the greater the reflection overhead. In high-concurrency scenarios, validation and serialization of complex models can account for 40% of request latency. Simplifying models directly reduces reflection operations, improving response speed by 20% to 30%.

4. Use Uvicorn + Gunicorn to Maximize Multi-Core CPU Utilization

How to implement: In production environments, use Gunicorn for process management and start Uvicorn worker processes equal to the number of CPU cores:

# Example for 4-core CPU: 4 Uvicorn processes bound to port 8000
gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8000
Enter fullscreen mode Exit fullscreen mode

Design principle: Python's Global Interpreter Lock (GIL) prevents a single process from utilizing multiple cores. Uvicorn is a pure asynchronous ASGI server, but a single process can only run on one core. Gunicorn manages multiple processes, allowing each Uvicorn process to occupy one core, resulting in linear throughput improvement with the number of cores.

5. Cache High-Frequency Data to Reduce Repeated Queries/Calculations

How to implement: Use fastapi-cache2 + Redis to cache popular data (e.g., configurations, leaderboards) and set a reasonable expiration time:

from fastapi_cache2 import CacheMiddleware, caches, cache
from fastapi_cache2.backends.redis import CACHE_KEY, RedisCacheBackend

app.add_middleware(CacheMiddleware)
caches.set(CACHE_KEY, RedisCacheBackend("redis://redis:6379/0"))

@app.get("/popular-products")
@cache(expire=300)  # Cache for 5 minutes to avoid repeated execution of complex SQL
async def get_popular_products(db=Depends(get_db)):
    return await db.execute("SELECT * FROM products ORDER BY sales DESC LIMIT 10")
Enter fullscreen mode Exit fullscreen mode

Design principle: API performance bottlenecks often arise from "repeated time-consuming operations" (e.g., scanning large tables, complex algorithms). Caching temporarily stores results, allowing subsequent requests to read data directly, reducing latency from hundreds of milliseconds to milliseconds. Distributed caching also supports sharing across multiple instances, making it suitable for cluster deployments.

6. Database Optimization: Connection Pools + Indexes + N+1 Prevention

How to implement:

  1. Use asynchronous connection pools to control the number of connections;
  2. Create indexes for query fields;
  3. Use select_related to avoid N+1 queries:
# Query a user and their associated orders in one go, avoiding the N+1 problem of "query 10 users + query 10 orders"
async def get_user_with_orders(user_id: int, db: AsyncSession = Depends(get_db)):
    return await db.execute(
        select(User).options(select_related(User.orders)).where(User.id == user_id)
    ).scalar_one_or_none()
Enter fullscreen mode Exit fullscreen mode

Design principle: Databases are the performance bottleneck for most APIs:

  • Connection establishment is time-consuming (connection pools reuse connections);
  • Full-table scans are slow (indexes reduce query complexity from O(n) to O(log n));
  • N+1 queries cause multiple I/O operations (a single query resolves this). These three optimizations can reduce database latency by over 60%.

7. Delegate Static Files to Nginx/CDN—Don’t Overburden FastAPI

How to implement: Use Nginx as a reverse proxy for static resources, and pair it with a CDN for large-scale projects:

server {
    listen 80;
    server_name api.example.com;

    # Nginx handles static files with a 1-day cache
    location /static/ {
        root /path/to/app;
        expires 1d;
    }

    # Forward API requests to FastAPI
    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
    }
}
Enter fullscreen mode Exit fullscreen mode

Design principle: FastAPI is an application server, and its efficiency in handling static files is over 10 times lower than Nginx. Nginx uses an asynchronous non-blocking model, specifically optimized for static file transmission; CDNs distribute content through edge nodes to further reduce user latency.

8. Streamline Middleware to Reduce Request Interception Overhead

How to implement: Retain only core middleware (e.g., CORS, authentication) and remove debugging middleware:

from fastapi.middleware.cors import CORSMiddleware

# Retain only CORS middleware and specify allowed origins and methods
app.add_middleware(
    CORSMiddleware,
    allow_origins=["https://example.com"],  # Avoid wildcard * to reduce security risks and performance loss
    allow_credentials=True,
    allow_methods=["GET", "POST"],  # Open only necessary methods
)
Enter fullscreen mode Exit fullscreen mode

Design principle: Middleware intercepts every request/response. Each additional middleware adds an extra layer of processing to the request. If middleware contains I/O operations (e.g., logging), it can also block the event loop. Streamlining middleware can reduce request chain latency by 15% to 20%.

9. Avoid Calling Synchronous Functions in Asynchronous Views to Prevent Blocking

How to implement:

  1. Prioritize asynchronous libraries (use aiohttp instead of requests);
  2. If synchronous functions are unavoidable, wrap them with asyncio.to_thread:
import asyncio
import requests  # Synchronous library—cannot be called directly in async views

@app.get("/sync-data")
async def get_sync_data():
    # Execute synchronous functions in a thread pool without blocking the event loop
    resp = await asyncio.to_thread(requests.get, "https://api.example.com/sync-data")
    return resp.json()
Enter fullscreen mode Exit fullscreen mode

Design principle: Synchronous functions occupy the event loop thread, causing other asynchronous tasks to queue. asyncio.to_thread offloads synchronous functions to a thread pool, allowing the event loop to continue processing other requests and balancing the use of synchronous libraries with performance.

10. Use Profiling Tools to Identify Bottlenecks—Avoid Blind Optimization

How to implement:

  1. Use cProfile to analyze slow requests;
  2. Use Prometheus + Grafana for metric monitoring:
import cProfile

@app.get("/profile-me")
async def profile_me():
    pr = cProfile.Profile()
    pr.enable()
    result = await some_expensive_operation()  # Business logic to be analyzed
    pr.disable()
    pr.print_stats(sort="cumulative")  # Sort by cumulative time to identify bottlenecks
    return result
Enter fullscreen mode Exit fullscreen mode

Design principle: The premise of optimization is identifying bottlenecks—adding caching to non-time-consuming functions is meaningless. Profiling tools accurately locate time-consuming points (e.g., a SQL query accounting for 80% of latency), while monitoring tools detect online issues (e.g., sudden latency spikes during peak hours), ensuring targeted optimization.

Summary

The core logic of FastAPI performance optimization is to "reduce blocking, reuse resources, and avoid redundant work". From code-level optimizations like async/await and simplified models, to deployment-level improvements such as server combinations and CDNs, and data-level enhancements like caching and database optimization—implementing these tips end-to-end will enable your FastAPI service to maintain low latency and high throughput even under high concurrency.

Leapcell: The Best of Serverless Web Hosting

Finally, I recommend Leapcell —the ideal platform for deploying Python services:

🚀 Build with Your Favorite Language

Develop effortlessly in JavaScript, Python, Go, or Rust.

🌍 Deploy Unlimited Projects for Free

Only pay for what you use—no requests, no charges.

⚡ Pay-as-You-Go, No Hidden Costs

No idle fees, just seamless scalability.

📖 Explore Our Documentation

🔹 Follow us on Twitter: @LeapcellHQ

Top comments (0)