3-Hour Debugging: How `time.sleep` in Async Functions Killed Our asyncio Concurrency

#python #异步编程 #踩坑 #性能优化

Here’s the situation: we have a data collection service that fetches data from a dozen upstream APIs. The synchronous version took a painful 30 minutes per run. I thought, “This is exactly the kind of problem asyncio was built for!” I spent an afternoon replacing requests with aiohttp, decorating every function with async/await, and ran the code — same 30 minutes. Not a single second faster. I was floored.

Eventually, I tracked the culprit to a stray time.sleep(0.5) buried deep inside a nested function. That half-second sleep was enough to freeze the entire event loop inside that coroutine, turning our glorious “async concurrency” back into plain old serial execution.

The takeaway? Some of asyncio’s most counterintuitive landmines are impossible to fully appreciate until you step on one yourself. Here’s the full post-mortem: the debugging journey, the root cause, and how to avoid this trap for good.

Why a Single `sleep` Can Destroy Concurrency

Let’s recap how asyncio works: the event loop is essentially a single-threaded scheduler that manages a queue of tasks. Coroutines defined with async def yield control back to the event loop whenever they await, allowing the loop to switch to other ready coroutines and keep making progress.

But yielding must be explicit. await asyncio.sleep(n) registers a timer with the event loop and immediately hands back control — other tasks get their turn. In contrast, time.sleep(n) is a synchronous blocking call. It puts the entire thread to sleep, the event loop gets zero control, and every coroutine you wrote simply waits in line, no matter how many tasks you’ve created.

In plain terms:

await asyncio.sleep(): “I’ll set a timer with the event loop and kindly step aside so others can work.”
time.sleep(): “I’m going to take a nap, and no one — not even the event loop — can do anything until I wake up.”

Bad Practice vs. The Right Way

Bad code (looks async but blocks the single thread):

import asyncio
import time

async def fetch_data(url: str):
    # 模拟请求前处理
    print(f"开始请求 {url}")
    time.sleep(0.5)           # ❌ 同步阻塞，整个事件循环停滞
    # 这里还会去发起 aiohttp 请求等等
    print(f"完成请求 {url}")

async def main():
    urls = [f"https://api.example.com/data/{i}" for i in range(10)]
    # 看似并发启动
    tasks = [asyncio.create_task(fetch_data(url)) for url in urls]
    await asyncio.gather(*tasks)

asyncio.run(main())

When you run this, you’ll see the prints appear one by one in order. Ten tasks take over 5 seconds, and concurrency is a complete illusion.

Good code (using asyncio.sleep to yield control):

import asyncio
import aiohttp

async def fetch_data(session: aiohttp.ClientSession, url: str) -> dict:
    """
    真正的异步请求函数：IO 全部交给事件循环调度
    """
    print(f"开始请求 {url}")
    # 模拟速率限制等待，使用 asyncio.sleep，不阻塞其他协程
    await asyncio.sleep(0.5)

    async with session.get(url, timeout=10) as resp:
        data = await resp.json()
        print(f"完成请求 {url}, 状态码 {resp.status}")
        return data

async def main():
    urls = [f"https://api.example.com/data/{i}" for i in range(10)]

    # 使用连接池复用 TCP 连接，减少开销
    connector = aiohttp.TCPConnector(limit=20)   # 最大并发连接数
    async with aiohttp.ClientSession(connector=connector) as session:
        tasks = [fetch_data(session, url) for url in urls]
        results = await asyncio.gather(*tasks, return_exceptions=True)

    # 简单错误处理
    for i, result in enumerate(results):
        if isinstance(result, Exception):
            print(f"请求 {urls[i]} 失败: {result}")

asyncio.run(main())

With this fix, the asyncio.sleep(0.5) calls happen concurrently across all 10 tasks, compressing the total waiting time to roughly 0.5 seconds (plus the actual request time). Paired with aiohttp’s connection pooling, the efficiency gain is night and day.

Lessons Learned: Ignore These and You’ll Trip Again

Check if your dependencies are truly async

Simply sprinkling async/await into your code isn’t enough. Synchronous libraries like time.sleep, requests, or pymongo will immediately choke your event loop if used inside a coroutine. Always switch to their async equivalents: aiohttp, httpx, motor (async MongoDB driver), aiomysql, etc. If a library has no async version, offload it to a thread pool using await asyncio.to_thread(sync_func, *args). It’s not perfect, but at least it won’t block the event loop.
The asyncio.gather exception trap

By default, gather will immediately propagate any exception thrown by a task, canceling other running tasks in the process. If you want all tasks to finish before handling results, remember to set return_exceptions=True and then manually inspect each returned value to see if it’s an Exception instance.
Don’t abandon tasks created with create_task

If you create a Task with asyncio.create_task but never await it or keep a reference, any exception it raises will be silently swallowed by garbage collection — you won’t even see an error log. Every Task should either be collected with gather or have its exception explicitly checked.
Don’t spawn an unbounded number of coroutines

Kicking off hundreds of concurrent connections when scraping hundreds of URLs can easily trip the target API’s rate limits. Always cap concurrency — for example, with an asyncio.Semaphore or by tuning your aiohttp connector pool — to avoid getting blocked or throttled.