The asyncio Mistake That Cost Me 3 Hours

#python #异步编程 #asyncio #性能优化

It happened last year when I was adding a “batch domain liveness check” feature to our internal operations platform. The requirement was simple: periodically poll 1000+ domains, check HTTP status codes, and flag any domain as down if it didn’t respond within 5 seconds. I thought to myself — this is clearly an I/O‑bound task. asyncio to the rescue, it should finish in minutes. So I threw in a bunch of async def, await, and gather operations, ran it with full confidence… and the result? 1000 domains took over four minutes — almost indistinguishable from synchronous, sequential requests. I stared at the screen, feeling personally trolled by Python.

Over the next three hours, my understanding of asyncio was dismantled and rebuilt. If you’ve ever “accidentally” blocked the event loop inside an async function, or lost exceptions inside gather without even realizing it, this war story should save you more than three hours.

1. Reproducing the Issue: Concurrent, but Not Really

Here’s the “concurrent” code I initially wrote (see if you can spot the problem):

import asyncio
import time
import requests

async def check_domain(url: str) -> dict:
    """检测单个域名的状态码和耗时"""
    start = time.monotonic()
    try:
        # 注意这里用的是 requests，同步库
        resp = requests.get(url, timeout=5, allow_redirects=True)
        status = resp.status_code
    except Exception as e:
        status = str(e)
    elapsed = time.monotonic() - start
    return {"url": url, "status": status, "elapsed": elapsed}

async def main():
    urls = [f"https://httpbin.org/delay/1?n={i}" for i in range(50)]  # 模拟慢速接口
    t_start = time.monotonic()
    # 希望全部并发
    results = await asyncio.gather(*[check_domain(url) for url in urls])
    t_end = time.monotonic()
    print(f"总耗时 {t_end - t_start:.2f} 秒，完成 {len(results)} 个检测")
    # 打印前 3 个结果
    for r in results[:3]:
        print(r)

if __name__ == "__main__":
    asyncio.run(main())

You might spot the issue immediately: calling the synchronous blocking requests.get inside an async coroutine. But back then, I was so fixated on “I defined it with async def, so it’s a coroutine, and gather will make it concurrent” that I completely ignored how the event loop actually works. Fifty URLs, each with a 1‑second delay — total time over 50 seconds. A textbook case of sequential requests hiding behind async syntax.

The root cause is that asyncio uses a single-threaded event loop. async def alone does not magically add concurrency; it just tells the interpreter “this function might yield control”. The real yield happens on await — but only if the object being awaited is a truly asynchronous implementation (e.g., an aiohttp request). The socket operations inside requests.get are synchronous and blocking. While a coroutine waits on them, the entire thread is stuck and the event loop never gets a chance to switch to other tasks. You can use gather all you want — it still runs them one after the other.

The fix is straightforward: use an async HTTP library like aiohttp:

import aiohttp
import asyncio
import time

async def check_domain_async(session: aiohttp.ClientSession, url: str) -> dict:
    """真正的异步检测"""
    start = time.monotonic()
    try:
        async with session.get(url, timeout=aiohttp.ClientTimeout(total=5)) as resp:
            status = resp.status
    except Exception as e:
        status = str(e)
    elapsed = time.monotonic() - start
    return {"url": url, "status": status, "elapsed": elapsed}

async def main_async():
    urls = [f"https://httpbin.org/delay/1?n={i}" for i in range(50)]
    t_start = time.monotonic()
    async with aiohttp.ClientSession() as session:
        tasks = [check_domain_async(session, url) for url in urls]
        results = await asyncio.gather(*tasks)
    t_end = time.monotonic()
    print(f"总耗时 {t_end - t_start:.2f} 秒，完成 {len(results)} 个检测")

With aiohttp, all 50 requests finish in about 1.5 seconds (assuming the server can keep up). The speedup is dramatic. This basic, almost embarrassing pitfall cost me at least an hour, precisely because it’s so counter‑intuitive: “I defined it with async, so I thought it was asynchronous” — a mistake that beginners and even engineers with a few years of experience can make.

2. A Deeper Pitfall: `gather` Swallows Your Exceptions

You might think that’s the end of the story. But a more subtle trap was waiting. In a later iteration, I added health‑check logic: if a domain was unreachable, trigger an alert. Yet I noticed that some domains were clearly down, but the code remained completely silent. After another hour of debugging, I discovered asyncio.gather’s exception‑handling behavior was the culprit.

By default, if one of the coroutines passed to gather raises an exception, gather does not immediately cancel the other tasks. Instead, it raises that exception at the point where you await it. My code at the time looked roughly like this:

try:
    results = await asyncio.gather(*tasks)
except Exception:
    logger.error("批量检测出错")

DEV Community

The asyncio Mistake That Cost Me 3 Hours

1. Reproducing the Issue: Concurrent, but Not Really

2. A Deeper Pitfall: `gather` Swallows Your Exceptions

Top comments (0)

1. Reproducing the Issue: Concurrent, but Not Really

2. A Deeper Pitfall: gather Swallows Your Exceptions

2. A Deeper Pitfall: `gather` Swallows Your Exceptions