Ivan Korostenskij

Posted on Nov 26

What Modern Python Uses for Async API Calls: HTTPX & TaskGroups

#productivity #learning #tutorial #python

Multiple API calls in Python are usually written in a way that makes them slow

You’ve written before:

import requests
from requests import Response

urls: list[str] = ["https://api.example.com/user/1", 
        "https://api.example.com/user/2",
        "https://api.example.com/user/3"]

for url in urls:
    response: Response = requests.get(url)
    print(response.json())

You run it. Request 1 goes out. Wait. Response comes back. Request 2 goes out. Wait…

Since these requests process one at a time, it's like sending texts to a friend, then staring at your phone - unblinking, refusing to eat, breathe, or move until they reply 'lol'.

Let me show you a better way. A way we can make these 100 requests take as long as your single slowest request.

Figure 1: Side-by-side comparison: Sequential requests (left) vs concurrent async requests (right). Sequential processes 10 API calls one-at-a-time taking 20 seconds total. Concurrent fires all 10 simultaneously, completing in 3 seconds - as fast as the slowest request.

00:00

Async vs Sync Batch API Calls

Ivan Korostenskij ・

#programming #beginners #python #tutorial

Use asynchronous programming for API calls

Most time is spent waiting between calls. The solution is making them asynchronous - fire them off all at once and move on until responses come back.

API calls in Python are traditionally and ubiquitously done with the requests library.

import requests

def get_dad_joke() -> str:

        response = requests.get("https://icanhazdadjoke.com/")

        return response.text

for _ in range(3):
        joke: str = get_dad_joke()
        print(joke)

# Output:
    # Why do pirates not know the alphabet? They always get stuck at "C".
    # You can't run through a camp site. You can only ran, because it's past tents.
    # Now you can talk about Botox and nobody raises an eyebrow.

Unfortunately, the requests library doesn’t support async requests function calling.

The modern solution is a library called httpx

So pull up your IDE and let’s walk through how to easily start from scratch or migrate an existing project and tangibly see those performance gains.

From requests to httpx

For existing projects using requests wanting to migrate, the good news is that httpx is a near drop-in replacement.

Let’s look at an example.

def get_dad_joke_requests() -> str:

    response = requests.get("https://icanhazdadjoke.com/")

    return response.text

async def get_dad_joke_httpx() -> str:

    with httpx.AsyncClient() as client:

        response = await client.get("https://icanhazdadjoke.com/")

        return response.text

💡 Fun fact

The default timeout for requests is infinity - your request will wait forever. In httpx, it's 5 seconds.

httpx also supports HTTP/2 (multiple requests over one connection) and has strong type hints that catch bugs before runtime.

These improvements come from decades of production lessons learned.

The core difference is the introduction of the AsyncClient() . This is a session object: a dedicated state manager for the requests that are going to be made between your computer and the API server.

Then the async with block acts as an automatic door - it opens when you approach, closes when you leave, and you never have to think about it. In our case, it’s automatically handling session.open() and session.close() calls.

One step up from this is passing the http.AsyncClient as a dependency:

async def get_dad_joke_httpx(client: httpx.AsyncClient) -> str:

        response = await client.get("https://icanhazdadjoke.com/")

        return response.text

This way, whatever function is calling this instantiates that client, instead of us making it over and over again in the API call function itself.

The resources a function uses should be handled at the highest possible level. In this case, that’s the client that we make requests through.

Use asyncio.TaskGroups, not asyncio.gather to execute multiple API calls at once

We’re getting away from for loops. Think about a race. The starting official doesn’t go up to every individual racer and tell them to start - he uses a starting pistol, telling everyone at once.

This is the traditional way of batch-executing multiple async calls:

import asyncio

from httpx import Response
import httpx

async def get_dad_joke_httpx(client: httpx.AsyncClient) -> str:

    response: Response = await client.get("https://icanhazdadjoke.com/")
        response.raise_for_status()
    return response.text

async def main() -> None:
    dad_jokes_amount: int = 10

    async with httpx.AsyncClient() as client:
        results: list[str | Exception] = await asyncio.gather(
            *[get_dad_joke_httpx(client=client) for _ in range(dad_jokes_amount)],
            return_exceptions=True
        )

    # Since `results` is not a list of either errors or dad jokes,
    #   we need to manually check and discern
    for result in results:
        if isinstance(result, Exception):
            print(f"Got an error: {result}")
        else:
            print(f"Successfully got dad joke: {result}")

asyncio.run(main())

The isinstance(result, Exception) is a major code smell - we shouldn’t have to check/guess what our return types are. This is exacerbated if we have custom exceptions:

if isinstance(result, RateLimitError):
        print("we got rate limited, slowing down...")
        ...
elif isinstance(result, APIDownError):
        print("API is down, retrying automatically in 10 minutes")
        ...
elif isinstance(result, Exception):
        print(f"Got an error: {result}")

Since Python 3.11 we can use two features together that handle this cleanly for us.

import asyncio

from httpx import Response
import httpx

async def get_dad_joke_httpx(client: httpx.AsyncClient) -> str:

    response: Response = await client.get("https://icanhazdadjoke.com/")

    response.raise_for_status()
    return response.text

async def main() -> None:
    dad_jokes_amount: int = 10

    async with httpx.AsyncClient() as client:
        try:
            async with asyncio.TaskGroup() as tg:
                tasks: list[asyncio.Task[str]] = [
                    tg.create_task(get_dad_joke_httpx(client=client))
                    for _ in range(dad_jokes_amount)
                ]

            # If we get here, we know NO errors have occurred
            for task in tasks:
                print(f"Successfully got dad joke: {task.result()}")

        # We hit this if ANY errors occurred - the whole batch fails as a group of errors
        except* Exception as eg:
            for error in eg.exceptions:
                print(f"Got an error: {error}")

asyncio.run(main())

Those two features are TaskGroups and Exception Groups

Here, TaskGroup lets us treat our batched API calls as one running process. If anything fails, the entire batch fails - we catch an “Exception group” of those errors and can loop through them akin to successful results.

Exception groups are defined in the code above as so, using a * (asterisk) next to the except keyword to catch that group of errors.

Why do we fail the group if just a single request failed?

It seems counter-intuitive, but we do it to “fail fast”.

Failing fast ensures your application state is binary: it either worked perfectly, or it didn't happen at all.

Here are two common scenarios:

If request 1 fails with a 401 unauthorized, the next 99 will too. This batch SHOULD fail as to not hammer the server. We use this first failure as a canary in a coalmine.
Most often, we’re working with operations that to be atomic. In this case, partial failures lead to corrupted data that requires manual intervention.
1. E.g., New employee onboarding:
  1. Create their email (fails)
  2. Create their Teams account (succeeds)
  3. Add them to payroll (succeeds)
2. In the case above, now we have an employee that is getting paid but isn’t able to log in. It’s much better to fail the whole batch and retry with a clean slate than have to debug a half-onboarded user

Important nuance - we can’t let 100s of API calls free into the world at once

While we could now make 1000 API calls instantly, we’re going to come into a big blocker of batch operations on public-facing APIs: rate limiting.

This is the problem:

async def main():
        # This will make 1000 requests at once
    tasks = [get_joke(client) for _ in range(1000)]
    await asyncio.gather(*tasks)

Best case scenario you get rate limited after the first 100 and your requests fail with 429 rate limited, worst case you get banned because your batch looks like a DDoS attack.

These limits (e.g., “20 requests a second”, “100 max concurrent connections”) prevent server overload and malicious attacks exploiting resource limits.

The way we get around this is with a Semaphore: a queue that limits the amount of active processes at a time.

import asyncio

from httpx import Response
import httpx

async def get_dad_joke_httpx(client: httpx.AsyncClient, semaphore: asyncio.Semaphore) -> str:

    async with semaphore:
        response: Response = await client.get("https://icanhazdadjoke.com/",
                                              headers={
                                                  "Accept": "text/plain"
                                              })

        response.raise_for_status()
        return response.text

async def main() -> None:
    dad_jokes_amount: int = 10
    sem = asyncio.Semaphore(5)

    async with httpx.AsyncClient() as client:
        try:
            async with asyncio.TaskGroup() as tg:
                tasks: list[asyncio.Task[str]] = [
                    tg.create_task(get_dad_joke_httpx(client=client,
                                                      semaphore=sem))
                    for _ in range(dad_jokes_amount)
                ]

            for task in tasks:
                print(f"Successfully got dad joke: {task.result()}")

        except* Exception as eg:
            for error in eg.exceptions:
                print(f"Got an error: {error}")

asyncio.run(main())

Now our code will be as fast as possible, just within the limits of the server we’re communicating with.

Conclusion

That's it. This is a common and extremely effective use of asynchronous programming. All that's left is to apply it. Even outside of Python, you're walking out of this with a seriously generalizable skill and I highly encourage you to look for opportunities to create and/or refactor real code to set this in.

Common opportunities I see to refactor:

Looping over API calls (usually batching calls)
An API relying on retries or a time.sleep() to avoid rate limits - use a semaphore!
Blocking, long, sync database queries - replace with async ones

Overall, start noticing blocking operations in your code - you'll often find that other things can be done while they're running.

One last thing:

Although I personally encourage asynchronous programming from the start of all new projects - you don't always need it. If you're making 3 API calls in a script you'll run once, requests is fine. Don't optimize for problems you don't have.

But when you do have that problem - watching requests crawl one by one, or notice yourself sitting infront of a slow progress bar a bit too often - you know what to do.