Roughly 30 % of production Python web services still run on Flask despite its synchronous request handling model. The choice between FastAPI and Flask directly influences how well a service scales under asynchronous I/O workloads.
📑 Table of Contents
- 🚀 Architecture — Why Async Matters
- ⚙️ FastAPI Basics — How Async Is Implemented
- 🔧 Example Endpoint
- 🚀 Running the Service
- 🐍 Flask Basics — When Sync Limits Appear
- 🔧 Example Endpoint
- 🚀 Running with Gunicorn
- 📊 Comparison — FastAPI vs Flask for Async Microservices
- 🛠 Deployment — Running Async Services in Production
- 🔧 Dockerfile
- 🔧 Kubernetes Deployment
- 🟩 Final Thoughts
- ❓ Frequently Asked Questions
- Can Flask handle async code?
- Do I need to change my database driver for async?
- Is FastAPI production‑ready?
- 📚 References & Further Reading
🚀 Architecture — Why Async Matters
Async I/O lets a single OS thread manage many concurrent network operations by yielding control during blocking system calls. When a request reaches a Python web framework, the interpreter typically invokes select() or epoll() to wait for socket readiness. In an async stack, the awaiting coroutine is paused, the event loop returns to the scheduler, and another coroutine can run without creating a new thread.
This differs from a synchronous model where each request blocks a thread until I/O completes, causing the thread count—and therefore memory usage—to grow linearly with concurrency. The OS thread‑creation syscall (pthread_create) can consume tens of microseconds, and each context switch adds overhead that scales with the number of workers.
What this does:
- event loop: central scheduler that multiplexes coroutines based on I/O readiness.
- non‑blocking sockets: system calls return immediately, allowing the loop to continue.
-
coroutine suspension: Python's
awaitkeyword marks points where execution yields.
Key point: Async I/O reduces per‑request latency by keeping the CPU busy while external resources are pending, a crucial advantage for microservices that depend on databases or third‑party APIs.
⚙️ FastAPI Basics — How Async Is Implemented
FastAPI is a modern Python web framework built on Starlette and Pydantic, providing first‑class support for async def endpoints. The framework detects coroutine functions and routes them through the underlying ASGI server (e.g., Uvicorn). The ASGI specification defines a callable that receives a scope, receive, and send trio, enabling non‑blocking request handling at the server level.
🔧 Example Endpoint
# app/main.py
from fastapi import FastAPI
import httpx app = FastAPI() @app.get("/weather")
async def get_weather(city: str): async with httpx.AsyncClient() as client: resp = await client.get(f"https://api.example.com/weather/{city}") return {"city": city, "temp_c": resp.json()["temp_c"]}
What this does:
- async def: marks the endpoint as a coroutine, allowing the event loop to pause while awaiting the external HTTP call.
-
httpx.AsyncClient: performs non‑blocking HTTP I/O using
await. - FastAPI routing: registers the coroutine with the ASGI server without additional boilerplate.
🚀 Running the Service
$ uvicorn app.main:app -host 0.0.0.0 -port 8000
INFO: Started server process [12345]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
Running under Uvicorn keeps a single process with an event loop, whereas a traditional WSGI server would need multiple worker processes to achieve comparable concurrency.
Key point: FastAPI’s async endpoint model maps directly to the underlying ASGI server, delivering true non‑blocking behavior without extra configuration. (More onPythonTPoint tutorials)
🐍 Flask Basics — When Sync Limits Appear
Flask is a WSGI‑based microframework that processes each request in a synchronous call stack. When a Flask view performs blocking I/O, the entire worker thread stalls until the operation finishes. To handle concurrent requests, Flask deployments typically run a process manager such as Gunicorn with multiple workers, each holding its own Python interpreter and memory footprint.
🔧 Example Endpoint
# app/__init__.py
from flask import Flask, jsonify, request
import requests app = Flask(__name__) @app.route("/weather")
def get_weather(): city = request.args.get("city", "London") resp = requests.get(f"https://api.example.com/weather/{city}") data = resp.json() return jsonify({"city": city, "temp_c": data["temp_c"]})
What this does:
- def: a regular function executed synchronously.
- requests.get: a blocking HTTP call that holds the worker thread.
- Flask routing: registers the view with the WSGI server.
🚀 Running with Gunicorn
$ gunicorn -w 4 -b 0.0.0.0:8000 app:app
[-07-01 12:00:00 +0000] [12346] [INFO] Starting gunicorn 20.1.0
[-07-01 12:00:00 +0000] [12346] [INFO] Listening at: http://0.0.0.0:8000 (12346)
[-07-01 12:00:00 +0000] [12346] [INFO] Using worker: sync
[-07-01 12:00:00 +0000] [12347] [INFO] Booting worker with pid: 12347
[-07-01 12:00:00 +0000] [12348] [INFO] Booting worker with pid: 12348
[-07-01 12:00:00 +0000] [12349] [INFO] Booting worker with pid: 12349
[-07-01 12:00:00 +0000] [12350] [INFO] Booting worker with pid: 12350
Adding workers scales linearly with memory because each worker embeds a full Python interpreter; context‑switch overhead also grows, becoming a bottleneck under heavy request rates.
Key point: Flask’s synchronous model requires multiple processes to achieve concurrency, increasing resource consumption compared to FastAPI’s single‑process async design. (Also read: ⚙️ FastAPI on GCP Cloud Run vs Compute Engine — Pricing and Performance Compared)
📊 Comparison — FastAPI vs Flask for Async Microservices
This table isolates the attributes that directly affect async microservice performance and operability.
| Attribute | FastAPI | Flask |
|---|---|---|
| Concurrency model | Native async/await, single‑process event loop | WSGI sync, requires multiple worker processes |
| Server protocol | ASGI (Uvicorn, Hypercorn) | WSGI (Gunicorn, uWSGI) |
| Memory per request | ~10 KB (coroutine stack) | ~2 MB (full interpreter per worker) |
| Typical latency overhead | ≈ 1 ms for I/O‑bound calls | ≈ 5–10 ms due to thread scheduling |
| Built‑in validation | Pydantic models, automatic OpenAPI | Manual, optional extensions |
The table shows that FastAPI’s async design reduces both CPU and memory footprints, which is decisive for microservices that must scale horizontally on limited resources. (Also read: ☁️ Deploy FastAPI on Azure App Service vs AKS — which one should you actually use?)
Key point: For async‑heavy workloads, FastAPI provides lower per‑request overhead and built‑in tooling that Flask lacks without additional extensions.
🛠 Deployment — Running Async Services in Production
Deploying async microservices requires a container image that runs an ASGI server and exposes the correct runtime environment. (Also read: ⚙️ Ansible roles vs playbooks for Docker provisioning — which one should you use?)
🔧 Dockerfile
# Dockerfile
FROM python:3.11-slim WORKDIR /app
COPY requirements.txt .
RUN pip install -no-cache-dir -r requirements.txt
COPY . . EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
What this does:
- FROM python:3.11-slim: lightweight base with the latest CPython.
- pip install: installs FastAPI, Uvicorn, and dependencies.
- CMD: launches the ASGI server in a single process.
🔧 Kubernetes Deployment
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata: name: weather-service
spec: replicas: 3 selector: matchLabels: app: weather template: metadata: labels: app: weather spec: containers: - name: fastapi image: myrepo/weather-service:latest ports: - containerPort: 8000 resources: limits: cpu: "500m" memory: "256Mi"
What this does:
- replicas: 3 ensures three pod instances for high availability.
- resources.limits caps CPU and memory per pod, aligning with the lower footprint of async services.
- containerPort: 8000 matches the Uvicorn exposure.
Using a single‑process ASGI container avoids sidecar process managers (e.g., Gunicorn) and simplifies health‑check configuration because the server itself reports readiness.
Async frameworks let a single process serve thousands of concurrent I/O‑bound requests without spawning threads.
Key point: Containerizing FastAPI with an ASGI server yields a lean, scalable deployment artifact, while Flask deployments typically require additional process‑management layers.
🟩 Final Thoughts
When building async microservices, the framework choice determines the fundamental concurrency model. FastAPI’s native async support translates directly into lower memory consumption, reduced latency, and simpler deployment pipelines. Flask can serve synchronous workloads efficiently, but scaling to high‑concurrency scenarios inevitably adds process overhead.
Adopting FastAPI for new async services aligns with current Python ecosystem practices and leverages built‑in OpenAPI generation, which accelerates API documentation and client creation. Existing Flask codebases can be migrated incrementally by introducing async routes where I/O dominates, but the full benefit is realized only when the entire stack embraces ASGI.
❓ Frequently Asked Questions
Can Flask handle async code?
Flask is WSGI‑based and executes view functions synchronously. Async libraries can be called inside a Flask view, but the call blocks the worker thread, so true concurrency is not achieved without multiple workers.
Do I need to change my database driver for async?
Yes. To keep the event loop non‑blocking, use an async‑compatible driver such as asyncpg for PostgreSQL or databases for generic SQL access. Synchronous drivers will block the loop just like requests does.
Is FastAPI production‑ready?
FastAPI is widely adopted in production; its core components (Starlette, Pydantic) are mature, and the official documentation recommends Uvicorn or Hypercorn as production ASGI servers.
💡 Want to practise this hands-on? DigitalOcean gives new accounts $200 free credit for 60 days — enough to spin up a full Linux/Docker/Kubernetes environment at no cost.
📚 Recommended reading: Best DevOps & cloud books on Amazon — from Linux fundamentals to Kubernetes in production, curated for working engineers.
📚 References & Further Reading
- FastAPI documentation — comprehensive guide to async endpoints and dependency injection: fastapi.tiangolo.com
- Flask documentation — official reference for WSGI request handling: flask.palletsprojects.com
Top comments (0)