Is asyncio Really Better Than Multithreading? I Tested 100 Concurrent Requests, and the Difference Is Huge

#python #异步编程 #asyncio #性能优化

Last month, the data platform I maintain suddenly got a new requirement: run health checks against 100+ downstream services. Each endpoint averages 200ms, and the whole check had to finish within 5 seconds. Without thinking twice, I fired up 100 threads. The thread-switching overhead immediately maxed out the CPU, and the response time shot past 8 seconds. My ops teammate dropped three question marks in the group chat.

That moment forced me to seriously re‑examine asyncio. I used to think async programming had a steep learning curve and was a magnet for bugs, but after a thorough benchmark I can only say: for IO‑bound workloads, asyncio and multithreading aren’t even in the same league. Here’s the full breakdown of running the same task with three different strategies—synchronous, multithreaded, and asyncio—head‑to‑head.

Test scenario: 100 HTTP requests, each with 200 ms latency

We spun up a mock downstream service with FastAPI. The /health endpoint deliberately sleeps for 200 ms and then returns {"status": "ok"}. The client fires 100 concurrent requests using three different approaches, and we measure total elapsed time and resource usage.

Approach 1: Synchronous sequential — predictably slow

# sync_demo.py — 同步请求，一个接一个
import time
import requests

URLS = [f"http://localhost:8000/health" for _ in range(100)]

def check_sync():
    results = []
    for url in URLS:
        resp = requests.get(url, timeout=5)
        results.append(resp.json())
    return results

if __name__ == "__main__":
    start = time.perf_counter()
    check_sync()
    elapsed = time.perf_counter() - start
    print(f"同步耗时: {elapsed:.2f}s")   # 20.3s 左右

Unsurprisingly, 100 × 200 ms = 20 seconds. The thread spends all its time waiting for network I/O while the CPU sits nearly idle. That’s with only 100 requests; at 1 000 the system would be effectively frozen for three minutes with zero concurrency.

Approach 2: Multithreading — looks concurrent, full of traps

# thread_demo.py — 100 个线程并发
import time
import requests
from concurrent.futures import ThreadPoolExecutor, as_completed

URLS = [f"http://localhost:8000/health" for _ in range(100)]

def fetch(url):
    return requests.get(url, timeout=5).json()

def check_thread():
    results = []
    with ThreadPoolExecutor(max_workers=100) as executor:
        futures = {executor.submit(fetch, url): url for url in URLS}
        for future in as_completed(futures):
            results.append(future.result())
    return results

if __name__ == "__main__":
    start = time.perf_counter()
    check_thread()
    elapsed = time.perf_counter() - start
    print(f"多线程耗时: {elapsed:.2f}s")  # 第一次 8.5s，后来波动在 3~6s

The first run took 8.5 seconds, with CPU usage instantly spiking to 90%. Python’s GIL is released during I/O, but creating 100 threads, the constant context switching, and lock contention add enormous overhead. When I dialled max_workers down to 30, the time dropped to 2.1 seconds and the CPU settled down—but that turns into “tuning by gut feeling,” and as soon as the thread count rises, the system becomes unstable again.

There’s an even sneakier trap: the requests library isn’t the most thread‑safe choice, its connection‑pool reuse is limited, and occasionally it throws a ConnectionResetError that’s a nightmare to debug.

Approach 3: asyncio + aiohttp — so smooth it feels like cheating

# async_demo.py — 使用 asyncio 和 aiohttp 并发请求
import asyncio
import time
import aiohttp

URLS = [f"http://localhost:8000/health" for _ in range(100)]

async def fetch(session, url):
    try:
        async with session.get(url, timeout=5) as resp:
            return await resp.json()
    except Exception as e:
        return {"error": str(e)}

async def check_async():
    async with aiohttp.ClientSession() as session:
        tasks = [fetch(session, url) for url in URLS]
        results = await asyncio.gather(*tasks)
    return results

if __name__ == "__main__":
    start = time.perf_counter()
    asyncio.run(check_async())
    elapsed = time.perf_counter() - start
    print(f"asyncio 耗时: {elapsed:.2f}s")  # 稳定在 0.45~0.60s

All 100 tasks are scheduled inside a single event loop and dispatched asynchronously. The total elapsed time is determined only by the slowest I/O call, consistently coming in under 0.6 seconds. CPU usage never topped 15%, and memory usage stayed almost perfectly flat. When my boss saw the results on the monitoring dashboard he asked if I had secretly added more servers—turns out I just rewrote the code with async/await.