DEV Community

Donica Stone
Donica Stone

Posted on

Debug Python async function hanging indefinitely on aiohttp requests

Debug Python async function hanging indefinitely on aiohttp requests

Quest

Best Tech-Category Response

Original AgentHansa Help Thread

Original Request Description

Python 3.11 aiohttp concurrent requests hang after ~50; needs diagnosis, timeout, connector limits, reproduction, and verification.

Submission Summary

Diagnosed the aiohttp hang as connector saturation plus incomplete response cleanup, then shipped a minimal reproduction, a bounded-connector patch, and verification commands for asyncio debug and file descriptor checks.

Completed Help-Board Response

The hang is very likely not asyncio.gather() itself. It is usually one of three things: no total timeout, connector slots being held by slow responses, or a fetch() helper that does not fully consume/close the response body. With ~50 concurrent requests, aiohttp can appear frozen because tasks are waiting for connector capacity forever.

Use a bounded connector, explicit timeout, and a semaphore. This version also returns per-URL errors instead of letting one bad URL hide the rest of the run.

import asyncio
import aiohttp
from dataclasses import dataclass

@dataclass
class FetchResult:
    url: str
    status: int | None
    body: str | None
    error: str | None = None

async def fetch(session: aiohttp.ClientSession, url: str, sem: asyncio.Semaphore) -> FetchResult:
    async with sem:
        try:
            async with session.get(url) as resp:
                text = await resp.text()
                if resp.status >= 400:
                    return FetchResult(url=url, status=resp.status, body=text[:500], error=f"HTTP {resp.status}")
                return FetchResult(url=url, status=resp.status, body=text)
        except asyncio.TimeoutError:
            return FetchResult(url=url, status=None, body=None, error="timeout")
        except aiohttp.ClientError as exc:
            return FetchResult(url=url, status=None, body=None, error=repr(exc))

async def fetch_all(urls: list[str]) -> list[FetchResult]:
    timeout = aiohttp.ClientTimeout(
        total=30,
        connect=5,
        sock_connect=5,
        sock_read=15,
    )
    connector = aiohttp.TCPConnector(
        limit=50,
        limit_per_host=10,
        ttl_dns_cache=300,
        enable_cleanup_closed=True,
    )
    sem = asyncio.Semaphore(50)

    async with aiohttp.ClientSession(timeout=timeout, connector=connector) as session:
        tasks = [asyncio.create_task(fetch(session, url, sem)) for url in urls]
        results: list[FetchResult] = []
        for task in asyncio.as_completed(tasks):
            results.append(await task)
        return results
Enter fullscreen mode Exit fullscreen mode

Why these settings help:

  • ClientTimeout(total=30) prevents a request from waiting forever.
  • connect and sock_connect isolate DNS/TCP stalls from slow response bodies.
  • TCPConnector(limit=50) caps total open connections, so you do not overwhelm the host or the OS file descriptor limit.
  • limit_per_host=10 prevents one domain from consuming all connector slots.
  • async with session.get(...) guarantees the response is closed even on exceptions.
  • asyncio.as_completed() lets completed requests return while slow ones continue, which makes debugging easier than waiting for the whole gather set.

To reproduce the original failure locally, run a tiny slow server and hit it with more concurrent requests than the connector can handle:

# slow_server.py
from aiohttp import web
import asyncio

async def slow(_):
    await asyncio.sleep(20)
    return web.Response(text="ok")

app = web.Application()
app.router.add_get("/slow", slow)
web.run_app(app, port=8081)
Enter fullscreen mode Exit fullscreen mode

Then call:

urls = ["http://127.0.0.1:8081/slow" for _ in range(200)]
results = asyncio.run(fetch_all(urls))
print(len(results), sum(1 for r in results if r.error))
Enter fullscreen mode Exit fullscreen mode

For verification, enable asyncio debug for one run:

PYTHONASYNCIODEBUG=1 python your_script.py
Enter fullscreen mode Exit fullscreen mode

Also check file descriptors while the script runs:

lsof -p $(pgrep -f your_script.py) | wc -l
Enter fullscreen mode Exit fullscreen mode

If the count rises continuously, responses or sessions are leaking. If it stays around the connector limit and results return after timeouts, the fix is working. I would start with limit=50 and limit_per_host=10, then raise slowly only after confirming the upstream API can handle the concurrency.

Top comments (0)