DEV Community

Lộc Trương
Lộc Trương

Posted on • Originally published at locionic.com

Threading, Multiprocessing, and Coroutines in Python: A Clear Explanation

These three concepts trip people up because their names sound similar, but they solve fundamentally different problems. This post walks through each one in order, starting from the problem they're designed to solve.

The Core Issue: The GIL

First, you need to understand the GIL (Global Interpreter Lock) — Python's internal lock that allows only one thread to run at any given moment.

This is an intentional design choice to protect Python's internal memory from corruption when multiple threads access it simultaneously. The side effect: even if your machine has 8 cores, Python defaults to using just one at a time.

The three tools below are three different strategies for working around this constraint.


Threading

The Idea

Threading creates multiple threads inside a single process, sharing the same memory and the same GIL. That sounds limiting — and it is, with one important exception.

When a thread is waiting for I/O (reading a file, calling an API, querying a database), it releases the GIL. That's when other threads can run.

Why It Works for I/O-bound Tasks

Say you need to download 3 files from the internet, each taking 3 seconds:

  • Sequential: 9 seconds total — while waiting for file 1, the CPU sits completely idle.
  • Threading: ~3 seconds total — while thread 1 waits, threads 2 and 3 send their requests.

The CPU isn't doing extra work — it just stops sitting idle doing nothing.

import threading
import requests

def fetch(url):
    r = requests.get(url)   # I/O → releases the GIL
    print(r.status_code)

urls = ["https://a.com", "https://b.com", "https://c.com"]
threads = [threading.Thread(target=fetch, args=(u,)) for u in urls]
[t.start() for t in threads]
[t.join()  for t in threads]
Enter fullscreen mode Exit fullscreen mode

When NOT to Use Threading

If your task is CPU-bound (heavy computation, math loops, image processing), threading won't help — the GIL still restricts execution to one thread, so the result is no different from running sequentially.

Quick rule: Does your task mostly wait (network, disk, DB)? → Threading. Does it mostly compute (math, heavy loops)? → Not threading.


Multiprocessing

The Idea

Instead of multiple threads inside one process, multiprocessing creates multiple independent processes. Each process has its own Python interpreter, its own memory — and its own GIL.

The result: each process can run on a separate CPU core, achieving true parallelism.

Why Threading Fails for CPU-bound Tasks

Say you're processing 4 images, each taking 2 seconds:

  • Threading: the GIL still restricts CPU access to one thread at a time. Other threads queue up and wait. Total time is still ~8 seconds.
  • Multiprocessing: 4 processes run on 4 cores simultaneously. Total time ~2 seconds.
from multiprocessing import Pool

def process_image(path):
    # heavy computation — resize, filter, compress
    return result

with Pool(processes=4) as pool:
    results = pool.map(process_image, image_paths)
Enter fullscreen mode Exit fullscreen mode

The Trade-offs

  • Slower startup: each new process takes ~100ms to spin up, whereas a thread takes just a few ms.
  • Higher memory cost: each process has its own separate memory space — no sharing.
  • More complex communication: you need Queue, Pipe, or shared memory to pass data between processes.

Use multiprocessing when: image/video processing, heavy numerical computation, machine learning, file compression — anything where the CPU is the bottleneck.


Coroutines (asyncio)

The Idea

Coroutines are the third approach — and they're actually much lighter than threading. Instead of letting the OS decide when to switch threads, the code itself says: "I'm waiting, go run something else."

There's only one thread. An event loop continuously asks: "which coroutine is ready to run?" When a coroutine hits await, it yields control, and the event loop runs another coroutine.

import asyncio

async def fetch(url):          # "async" → this is a coroutine
    print(f"Sending request to {url}")
    await asyncio.sleep(2)     # "await" → pause, yield control
    print(f"Done: {url}")

async def main():
    await asyncio.gather(       # run all 3 concurrently
        fetch("a.com"),
        fetch("b.com"),
        fetch("c.com"),
    )

asyncio.run(main())
Enter fullscreen mode Exit fullscreen mode

Coroutines vs Threading: Same Result, Different Mechanism

Both work well for I/O-bound tasks. But:

Threading Coroutine
Thread count many exactly 1
Who schedules the OS your code (await)
Overhead moderate very low
Creating 10,000 problematic no problem

Creating 10,000 threads → crash or severe slowdown. Creating 10,000 coroutines → perfectly fine. That's why modern web frameworks (FastAPI, aiohttp) use asyncio instead of threading.

The Chef Analogy

A single chef serving multiple tables: put table 1's order on the stove (await), go pour water at table 2, carry food to table 3 — no extra staff needed, just never stand and wait.


Summary: Which One Do You Choose?

Threading Coroutine Multiprocessing
Best for simple I/O high-concurrency I/O heavy CPU work
Threads/processes many threads 1 thread many processes
Overhead moderate very low high
Shared memory yes yes no
Module threading asyncio multiprocessing
Modern API concurrent.futures async/await concurrent.futures

Quick Decision Tree

What does your task spend most of its time doing?

├── Waiting on network / disk / DB (I/O-bound)
│   ├── Thousands of concurrent connections? → asyncio (coroutine)
│   └── A few dozen simple tasks?           → threading
│
└── Continuous computation (CPU-bound)      → multiprocessing
Enter fullscreen mode Exit fullscreen mode

If you're unsure, concurrent.futures is the easiest entry point — it provides a unified API for both ThreadPoolExecutor and ProcessPoolExecutor:

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

# I/O-bound
with ThreadPoolExecutor(max_workers=4) as ex:
    results = list(ex.map(fetch_url, urls))

# CPU-bound
with ProcessPoolExecutor(max_workers=4) as ex:
    results = list(ex.map(heavy_compute, data))
Enter fullscreen mode Exit fullscreen mode

Understanding these three tools covers the vast majority of Python concurrency. Everything else — advanced asyncio, GIL internals, uvloop — is just detail on top of this foundation.

Top comments (0)