BAOFUFAN

Posted on May 1

3 Asyncio Pitfalls That Took Me 3 Hours to Debug and Almost Crashed Production

#python #异步编程 #asyncio #性能优化

Here’s the story: last week my lead asked me to optimize a data aggregation service that calls 20 downstream APIs. The serial version took around 18 seconds — users were ready to throw their keyboards. Obvious IO-bound job, right? I thought I’d slap on asyncio, ship it in half a day, and look like a hero. Instead, I spent three hours falling into every rabbit hole asyncio had to offer, and nearly took down production. This post walks through the three biggest pitfalls I hit and how to write async code that actually works in the real world.

Get Your Concepts Straight First

At its core, asyncio is a single-threaded event loop — a master scheduler that lines up coroutines. When one coroutine is waiting on IO, the loop politely tells it to step aside and runs whichever coroutine is ready instead. You only need two keywords: async def to define a coroutine function, and await to yield control, telling the event loop “I’ll be waiting here, go do something else.”

Most tutorials show you this perfect‑world example:

import asyncio

async def fetch(url):
    await asyncio.sleep(1)  # simulate network IO
    return f"data from {url}"

async def main():
    tasks = [fetch(f"api/{i}") for i in range(5)]
    results = await asyncio.gather(*tasks)
    print(results)

asyncio.run(main())

Clean, elegant, 5 requests in 1 second. But the moment you drop this into a real project, things get messy.

Pitfall 1: `await` Inside a Sync Function — And Boom, Errors

I naively added await fetch() right inside an existing Flask route function. Immediate SyntaxError: 'await' outside async function. Alright, I’ll just change the route to async def. Request comes in — RuntimeError: There is no current event loop in thread 'Thread-1'.

Here’s why: Flask uses a thread pool to handle requests. Each worker thread doesn’t have its own event loop, and you can’t just call asyncio.run() inside a thread that already has a loop running. My view ended up calling asyncio.run(main()) and triggered a cascade of “event loop already running” errors.

What you should do: If you can, switch to an async‑native framework like Quart or FastAPI. If you’re stuck with Flask, create a global event loop at startup and schedule work with loop.run_until_complete(). Or, even simpler: spin up a background asyncio thread and communicate with the web thread via a queue.

Pitfall 2: Blocking Calls Inside a Coroutine — Performance Tanks

Feeling clever, I wrote:

results = await asyncio.gather(*[call_api_blocking(i) for i in range(20)])

Total time? Still ~18 seconds. Logging showed each task finishing one after another, no concurrency at all. The culprit: call_api_blocking used requests.get(), a synchronous blocking call. await is useless here — while the first requests.get sits there, the whole thread is frozen and no other coroutine gets a chance to run.

Asyncio only plays nice with its own async IO primitives. When you have a blocking call, you must ship it to a thread pool with loop.run_in_executor():

async def call_api_async(url):
    loop = asyncio.get_running_loop()
    return await loop.run_in_executor(None, requests.get, url)

Now the blocking happens in a separate thread and the event loop can immediately switch to another coroutine. Later I replaced requests with aiohttp entirely, and performance really took off. The golden rule: async is all-or-nothing. Don’t mix in blocking calls that hijack your thread.

Pitfall 3: Orphaned Tasks — Memory Climbs, Then OOM

After performance looked good, I rolled it out. Two days later, the pod was OOMKilled. Memory kept growing slowly, and the GC wasn’t collecting objects. After digging, I found the culprit. To “flexibly control concurrency” I had written something like this:

tasks = []
for url in urls:
    task = asyncio.create_task(process(url))
    tasks.append(task)
for t in tasks:
    await t

Looks fine, right? But inside process(url) some branches returned early, and a few exceptions weren’t handled properly. This left tasks in PENDING or CANCELLED state while still referenced by the tasks list. Those tasks held onto large response data, so the GC chain was never broken — classic memory leak.

The fix: Use asyncio.TaskGroup (Python 3.11+) to manage lifetimes automatically. If any task fails, all others are cancelled and resources are cleaned up:

async def main():
    async with asyncio.TaskGroup() as tg:
        for url in urls:
            tg.create_task(process(url))

If you’re on an older Python version, be diligent about cancelling pending tasks in a finally block and clearing references.

The Production‑Ready Version

Here’s the core skeleton I ended up with — concurrency controlled via semaphore, a reused aiohttp session, isolated exceptions, and timeouts:

import asyncio
import aiohttp
import time
from typing import List

class AsyncFetcher:
    def __init__(self, concurrency: int = 10, timeout: int = 10):
        self.sem = asyncio.Semaphore(concurrency)  # limit concurrency to avoid hammering downstream
        self.timeout = aiohttp.ClientTimeout(total=timeout)

    async def fetch_one(self, session: aiohttp.ClientSession, url: str) -> dict:
        async with self.sem:
            try:
                async with session.get(url, timeout=self.timeout) as resp:
                    data = await resp.json()

DEV Community

3 Asyncio Pitfalls That Took Me 3 Hours to Debug and Almost Crashed Production

Get Your Concepts Straight First

Pitfall 1: `await` Inside a Sync Function — And Boom, Errors

Pitfall 2: Blocking Calls Inside a Coroutine — Performance Tanks

Pitfall 3: Orphaned Tasks — Memory Climbs, Then OOM

The Production‑Ready Version

Top comments (0)

Get Your Concepts Straight First

Pitfall 1: await Inside a Sync Function — And Boom, Errors

Pitfall 2: Blocking Calls Inside a Coroutine — Performance Tanks

Pitfall 3: Orphaned Tasks — Memory Climbs, Then OOM

The Production‑Ready Version

Pitfall 1: `await` Inside a Sync Function — And Boom, Errors