FastAPI makes building backend services easier than ever - look at this endpoint masquerading as a simple function.
@app.get("/hello")
def hello(name: str = "world") -> dict[str, str]:
return {"message": f"Hello, {name}!"}
Compare that to the boilerplate in this Java Servlet function I got ChatGPT to write up for me:
import java.io.IOException;
import javax.servlet.http.*;
public class HelloServlet extends HttpServlet {
@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws IOException {
String name = req.getParameter("name");
if (name == null) name = "world";
resp.setStatus(200);
resp.setContentType("application/json; charset=utf-8");
resp.getWriter().write("{\"message\":\"Hello, " + name + "!\"}");
}
}
To be fair, this simplicity is not strictly FastAPI-specific. When you consider the incredible level of abstraction provided by modern frameworks (including Java frameworks), plus the fact AI agents largely rule the world of manual coding, it can be difficult to grasp exactly what's going on under the hood of a framework like FastAPI. And when I was building simple CRUD apps, this was honestly fine. But - after being burnt a few too many times by AI (more on that) - I've been leveling up my Python skills. I don't have any intention of replacing AI, but it's very clear to me that an advanced and well-crafted backend service needs human expertise. Which means really understanding what FastAPI is doing behind the scenes (and most other async frameworks by extension).
Let me illustrate with an example. Consider this.
When I asked ChatGPT how to sandbox an under-tested simulation pipeline, it suggested spinning it up in a separate process - good instinct. The problem was where it told me to do it.
def run_simulation():
process = multiprocessing.Process(target=run_pipeline)
process.start()
return {"status": "started"}
Looks fine right? Maybe not to a seasoned developer, but on first glance it definitely could seem fair enough. Deep in the codebase, this would have been a dangerous flaw leading to a new process spawned on every request. It's something you might not even realize is a problem with the low traffic of your local environment.
Luckily, I've learnt (after being caught out) never to trust code you don't understand. And now I finally had a use for that pesky lifespan() hook in main.py!
When you're writing regular-looking functions, it's easy to forget that FastAPI is executing them in a highly concurrent, event-driven environment.
Or whatever that means. It's all good to throw around these words, but it never actually helped me logically wrap my head around what's going on. So here's my explanation, grounded in actual examples for you to picture.
When you start a FastAPI service you might write something like this.
uvicorn main:app --workers 4
Specifying 4 workers just means you're running the service four times. Or eight times. Or whatever number you choose. So for the purpose of this discussion, we're talking about a single FastAPI worker/instance, but you can imagine this happening "times 4" or "times 8" respectively.
Let's start with processes and threads.
Your first "hello world" script probably executed synchronously (i.e., one line after the other). But this won't cut it if you want to serve millions of requests.
Before I discuss the nature of asynchronous frameworks, it is important to understand the concept of threads and processes. I was taught about these at university, but they always seemed a bit magical until I really understood how FastAPI (as an example), uses them in practice.
When you run a simple Python script like python script.py, the OS internally starts one process with one thread. The thread runs the script. Threads seem invisible in "normal" Python because the thread simply executes one line after the other. Internally, the thread has its own call stack (i.e., lines of code to run). I like to think of the thread as a worker.
But this can be confusing. It's important to understand that threads, which I'm calling workers, are completely distinct from the FastAPI worker, which is actually a process. In this analogy, the FastAPI worker (or process) is really more like "the factory", as it owns the resources and completes jobs using workers.
Quick side note: from the perspective of Uvicorn/Gunicorn a FastAPI instance is more of a worker, but I digress! For our purposes and for the rest of this post, a thread is a worker and a process is a factory.
Now if your factory only does one thing at a time, it probably only needs one worker. But no factory - FastAPI included - only does one thing at a time. So the natural solution is to hire more workers. Now we're approaching the way FastAPI works, but there's one more layer of complexity.
If you've ever managed a group of workers, you'll know one thing you definitely don't want is for them to be idle.
But when you're serving requests, workers will often be idle - maybe for entire seconds, while waiting for external calls (for example, from databases). Seconds might not seem like much to us, but to these hard-working threads, a lot can get done in these precious moments.
So FastAPI does something unexpected (at least it was unexpected to me the first time I internalized it!). Each FastAPI instance mainly relies on a single worker, a lone thread doing unimaginable quantities of soul-crushing work. At least for asynchronous requests, but more on that ↓
Enter the coroutine and the event loop.
Your FastAPI process exists within exactly one event loop. And the event loop manages coroutines. I like think of coroutines as jobs, incoming work. And these jobs can be paused. So the event loop is the manager of our lone thread and whenever the worker has a moment spare, the manager gives him a new job to work on.
But how does the manager know when the worker has a moment to spare? Well, this is why we use await. Every time you use this keyword, you're exposing the worker as "not busy" and saying "give him more work, he's waiting for the database to respond!" Instead of having a bunch of workers sitting around half the time, FastAPI works a single thread to the bone, never giving him an idle moment (again, with an important caveat).
I don't know about other people, but I certainly feel like I initially rote-learned where to put the keyword async and await. But if you've made it this far in my post, you'll have a much better understanding of when to use them.
Whenever we use the keyword async to define a function or an endpoint, it will be a coroutine (or a job). If you want to use await you need to use it within a coroutine, because it doesn't make sense to await a thread who's only doing one job!
So what happens if you don't use async to define a function?
Ok, remember the caveats about FastAPI depending on a single worker? Well this is not strictly true - only true for asynchronous endpoints. If you choose not to use async to define a function, the task will be assigned to the threadpool - not the event loop. The threadpool is a distinct mechanism, essentially a separate group of workers, ready to do work that could freeze the very important event loop. You can explicitly offload work to the threadpool too.
This helped me un-rote-learn-and-logically-understand when to use async.
It's important not to be too liberal with your use of async. If you have an endpoint or function that could block our lone worker for long periods of time, maybe because of CPU-heavy computation, don't use async. You don't want to burden your star worker with the grunt work at which the threadpool excels. This can lead to the event loop getting overwhelmed and spending more time organising all the jobs that come in, rather than getting them done, a vicious cycle that spikes latency.
So in my example with the risky simulation, why not use a thread rather than a process?
Using the threadpool for my under-tested simulation certainly would have taken significant grunt work away from the event loop. However, my main concern was the possibility of the simulation crashing. The event loop and the threadpool live in the same process, meaning they share memory, resources, and a Python interpreter. They are essentially operating in the same factory.
And this is where the analogy mostly breaks down. If the thread that is computing the simulation gets stuck, you might think that we can tell it to "start again". But this is not how a program runs. Unfortunately, you can't simply kill a thread in Python, as it's intertwined memory-wise and interpreter-wise with our event loop. At least, you can't do it easily. So if the simulation crashes while being run on a thread, it's bringing down the whole factory with it.
What's the solution? Give the simulation its own factory, its own process - a sandbox, if you like. The simulation now runs with its own Python instance, with its own allocation of memory and CPU that it can do with as it likes. And if the factory goes down, we can simply build a new one and get the simulation running in there. The event loop is protected and our simulation runs safely and separately.
So what was the issue with our AI suggestion at the beginning?
ChatGPT was recommending that we build a new factory (or process) every time a simulation was run. This is not only inefficient, but would eventually lead to resources being stretched thin across thousands of unused processes.
So instead, when the application starts up, we build a process(es). We only rebuild the process(es) if it (or they) time(s) out. Parentheses added because you can start more than one process at runtime.
I hope this long-winded analogy is helpful! It certainly allowed me to understand what's going on underneath the innocent-looking functions used by FastAPI. And this is important, because the framework (and by extension, probably the AI agent you're using) doesn't know your intent, so it won't stop you doing something architecturally catastrophic - that part is still on you.
Feel free to comment clarifications or your own insights in the comments!
Glossary if you get lost in the extended metaphor
- Threads: I think of them as workers, completing work.
- Processes: Sort of like a factory, with its own Python interpreter, memory and CPU allocation.
- FastAPI instance or FastAPI worker: This is an example of a process, a specific type of factory if you will.
- Coroutine: A job or work request made to the factory.
- Event loop: The manager of the factory, albeit only managing a single worker.
- Threadpool: A group of workers ready for grunt work.
Top comments (0)