These three concepts trip people up because their names sound similar, but they solve fundamentally different problems. This post walks through each one in order, starting from the problem they're designed to solve.
The Core Issue: The GIL
First, you need to understand the GIL (Global Interpreter Lock) — Python's internal lock that allows only one thread to run at any given moment.
This is an intentional design choice to protect Python's internal memory from corruption when multiple threads access it simultaneously. The side effect: even if your machine has 8 cores, Python defaults to using just one at a time.
The three tools below are three different strategies for working around this constraint.
Threading
The Idea
Threading creates multiple threads inside a single process, sharing the same memory and the same GIL. That sounds limiting — and it is, with one important exception.
When a thread is waiting for I/O (reading a file, calling an API, querying a database), it releases the GIL. That's when other threads can run.
Why It Works for I/O-bound Tasks
Say you need to download 3 files from the internet, each taking 3 seconds:
- Sequential: 9 seconds total — while waiting for file 1, the CPU sits completely idle.
- Threading: ~3 seconds total — while thread 1 waits, threads 2 and 3 send their requests.
The CPU isn't doing extra work — it just stops sitting idle doing nothing.
import threading
import requests
def fetch(url):
r = requests.get(url) # I/O → releases the GIL
print(r.status_code)
urls = ["https://a.com", "https://b.com", "https://c.com"]
threads = [threading.Thread(target=fetch, args=(u,)) for u in urls]
[t.start() for t in threads]
[t.join() for t in threads]
When NOT to Use Threading
If your task is CPU-bound (heavy computation, math loops, image processing), threading won't help — the GIL still restricts execution to one thread, so the result is no different from running sequentially.
Quick rule: Does your task mostly wait (network, disk, DB)? → Threading. Does it mostly compute (math, heavy loops)? → Not threading.
Multiprocessing
The Idea
Instead of multiple threads inside one process, multiprocessing creates multiple independent processes. Each process has its own Python interpreter, its own memory — and its own GIL.
The result: each process can run on a separate CPU core, achieving true parallelism.
Why Threading Fails for CPU-bound Tasks
Say you're processing 4 images, each taking 2 seconds:
- Threading: the GIL still restricts CPU access to one thread at a time. Other threads queue up and wait. Total time is still ~8 seconds.
- Multiprocessing: 4 processes run on 4 cores simultaneously. Total time ~2 seconds.
from multiprocessing import Pool
def process_image(path):
# heavy computation — resize, filter, compress
return result
with Pool(processes=4) as pool:
results = pool.map(process_image, image_paths)
The Trade-offs
- Slower startup: each new process takes ~100ms to spin up, whereas a thread takes just a few ms.
- Higher memory cost: each process has its own separate memory space — no sharing.
- More complex communication: you need Queue, Pipe, or shared memory to pass data between processes.
Use multiprocessing when: image/video processing, heavy numerical computation, machine learning, file compression — anything where the CPU is the bottleneck.
Coroutines (asyncio)
The Idea
Coroutines are the third approach — and they're actually much lighter than threading. Instead of letting the OS decide when to switch threads, the code itself says: "I'm waiting, go run something else."
There's only one thread. An event loop continuously asks: "which coroutine is ready to run?" When a coroutine hits await, it yields control, and the event loop runs another coroutine.
import asyncio
async def fetch(url): # "async" → this is a coroutine
print(f"Sending request to {url}")
await asyncio.sleep(2) # "await" → pause, yield control
print(f"Done: {url}")
async def main():
await asyncio.gather( # run all 3 concurrently
fetch("a.com"),
fetch("b.com"),
fetch("c.com"),
)
asyncio.run(main())
Coroutines vs Threading: Same Result, Different Mechanism
Both work well for I/O-bound tasks. But:
| Threading | Coroutine | |
|---|---|---|
| Thread count | many | exactly 1 |
| Who schedules | the OS | your code (await) |
| Overhead | moderate | very low |
| Creating 10,000 | problematic | no problem |
Creating 10,000 threads → crash or severe slowdown. Creating 10,000 coroutines → perfectly fine. That's why modern web frameworks (FastAPI, aiohttp) use asyncio instead of threading.
The Chef Analogy
A single chef serving multiple tables: put table 1's order on the stove (await), go pour water at table 2, carry food to table 3 — no extra staff needed, just never stand and wait.
Summary: Which One Do You Choose?
| Threading | Coroutine | Multiprocessing | |
|---|---|---|---|
| Best for | simple I/O | high-concurrency I/O | heavy CPU work |
| Threads/processes | many threads | 1 thread | many processes |
| Overhead | moderate | very low | high |
| Shared memory | yes | yes | no |
| Module | threading |
asyncio |
multiprocessing |
| Modern API | concurrent.futures |
async/await |
concurrent.futures |
Quick Decision Tree
What does your task spend most of its time doing?
├── Waiting on network / disk / DB (I/O-bound)
│ ├── Thousands of concurrent connections? → asyncio (coroutine)
│ └── A few dozen simple tasks? → threading
│
└── Continuous computation (CPU-bound) → multiprocessing
If you're unsure, concurrent.futures is the easiest entry point — it provides a unified API for both ThreadPoolExecutor and ProcessPoolExecutor:
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
# I/O-bound
with ThreadPoolExecutor(max_workers=4) as ex:
results = list(ex.map(fetch_url, urls))
# CPU-bound
with ProcessPoolExecutor(max_workers=4) as ex:
results = list(ex.map(heavy_compute, data))
Understanding these three tools covers the vast majority of Python concurrency. Everything else — advanced asyncio, GIL internals, uvloop — is just detail on top of this foundation.
Top comments (0)