Getting Started with HTTPX, Part 3: Building a Python REST Client (Asynchronous Version)

#python #httpx #asyncio #pythonpoetry

HTTPX is a modern HTTP client library for Python. Its interface is similar to the old standby Requests, but it supports asynchronous HTTP requests, using Python's asyncio library (or trio). In other words, while your program is waiting for an HTTP request to finish, other work does not need to be blocked.

In Part 1, we built a simple Wikipedia search tool using Python and HTTPX. Even though HTTPX was used, the tool was only synchronous. In other words, each HTTP request was sent sequentially, and subsequent requests only start after the previous one is complete. A lot of waiting in line.

Now, let's do what HTTPX is good for: asynchronous HTTP requests.

`async` and `await`

Python's asyncio allows tasks to collaborate. When a task is busy waiting on input/output, it can give other tasks room to do their business.

To designate such a function, precede it with the async keyword. To call such a function, precede the call with the await keyword.

We can create another python module (a file), src/pypedia/asynchronous.py with the following code that usesasync and await. It is nearly the same as the code from Part 1, with a few differences. Feel free to compare the two.

"""Proof-of-concept asynchronous Wikipedia search tool."""
import asyncio
import logging
import time

import httpx

EMAIL = "your_email@provider"  # or Github URL or other identifier
USER_AGENT = {"user-agent": f"pypedia/0.1.0 ({EMAIL})"}

logging.basicConfig(filename="asyncpedia.log", filemode="w", level=logging.INFO)
LOG = logging.getLogger("asyncio")


async def search(query, limit=100, client=None):
    """Search Wikipedia, returning a JSON list of pages."""
    if client:
        close_client = False
    else:
        client = httpx.AsyncClient()
        close_client = True
    LOG.info(f"Start query '{query}': {time.strftime('%X')}")
    url = "https://en.wikipedia.org/w/rest.php/v1/search/page"
    params = {"q": query, "limit": limit}
    response = await client.get(url, params=params)
    if close_client:
        await client.aclose()
    LOG.info(f"End query '{query}': {time.strftime('%X')}")
    return response


async def list_articles(queries):
    """Execute several Wikipedia searches."""
    async with httpx.AsyncClient(headers=USER_AGENT) as client:
        tasks = [search(query, client=client) for query in queries]
        responses = await asyncio.gather(*tasks)
    results = (response.json()["pages"] for response in responses)
    return dict(zip(queries, results))


def run():
    queries = [
        "linksto:Python_(programming_language)",
        "incategory:Computer_programming",
        "incategory:Programming_languages",
        "incategory:Python_(programming_language)",
        "incategory:Python_web_frameworks",
        "incategory:Python_implementations",
        "incategory:Programming_languages_created_in_1991",
        "incategory:Computer_programming_stubs",
    ]
    results = asyncio.run(list_articles(queries))
    for query, articles in results.items():
        print(f"\n*** {query} ***")
        for article in articles:
            print(f"{article['title']}: {article['excerpt']}")

Note the use of httpx.AsyncClient rather than httpx.Client, in both list_articles() and in search().

In list_articles(), the client is used in a context manager. Because this is asynchronous, the context manager uses async with not just with.

In search(), if the client is not specified, it is instantiated, not with the context manager, but with client = httpx.AsyncClient(). When using this method, the responsibility is on us to close the client with await client.aclose(). Bad news if we forget to do this.

Our two primary functions have been preceded by the async keyword to indicate that they are async-friendly. In other words, they are willing to share control of the event loop when twiddling their thumbs.

If there was a need to call search() individually, then we could do so with await search().

However, in this case, we need to concurrently run several calls to search().

`asyncio.gather()`

The list_articles() function calls the awaitable search() function using the function asyncio.gather(). This will create tasks for the event loop and run them concurrently.

Conveniently, asyncio.gather() returns a list of each task's return values, in the exact order the functions were passed in.

Note: put await before asyncio.gather(), but do not put await before the functions passed to it. The awaiting of each call will be handled by asyncio.gather().

Event loop

I have already mentioned the event loop a couple times. I think of the event loop as the (there should be only one) task runner for asyncio applications. It handles the tasks.

Instantiating the event loop is done from the only non-awaitable function in our script. I named the function run(), coincidentally, and it calls the high level function asyncio.run().

Put another way, a synchronous function cannot await an asynchronous function. But it can asyncio.run()]run it.

This creates a new event loop that then handles the various awaitable tasks, and returns the result of the called awaitable function.

Enable the command runner

Our run() function executes whatever we want to have executed when called as a script. In this case, it creates a list of search terms, then sends the list to list_articles(), then parses and prints the result.

With Poetry, the entry point for a script is defined in pyproject.toml. So we add this to that file. Assuming you already had the synchronous syncpedia defined, that section should now look like this:

[tool.poetry.scripts]
asyncpedia = "pypedia.asynchronous:run"
syncpedia = "pypedia.synchronous:run"

So, the script asyncpedia will call the run function of the asynchronous submodule of the package pypedia. And, as already defined, the script syncpedia will call the run function of the sync submodule of the package pypedia.

Try it out:

poetry run asyncpedia

Assuming all works well, titles and excerpts of many Wikipedia articles should scroll by.

Performance benefits of async

Unlike the script from Part 1, the calls to the Wikipedia API now happen asynchronously, sharing the event loop concurrently. One request, while waiting for Wikipedia to respond, can share control of the event loop with the others. This can be seen in the log file.

$ cat asyncpedia.log
INFO:asyncio:Start query 'linksto:Python_(programming_language)': 06:03:39
INFO:asyncio:Start query 'incategory:Computer_programming': 06:03:39
INFO:asyncio:Start query 'incategory:Programming_languages': 06:03:39
INFO:asyncio:Start query 'incategory:Python_(programming_language)': 06:03:39
INFO:asyncio:Start query 'incategory:Python_web_frameworks': 06:03:39
INFO:asyncio:Start query 'incategory:Python_implementations': 06:03:39
INFO:asyncio:Start query 'incategory:Programming_languages_created_in_1991': 06:03:39
INFO:asyncio:Start query 'incategory:Computer_programming_stubs': 06:03:39
INFO:asyncio:End query 'incategory:Python_implementations': 06:03:39
INFO:asyncio:End query 'incategory:Python_(programming_language)': 06:03:39
INFO:asyncio:End query 'incategory:Programming_languages_created_in_1991': 06:03:39
INFO:asyncio:End query 'incategory:Python_web_frameworks': 06:03:39
INFO:asyncio:End query 'incategory:Computer_programming_stubs': 06:03:39
INFO:asyncio:End query 'incategory:Computer_programming': 06:03:40
INFO:asyncio:End query 'linksto:Python_(programming_language)': 06:03:40
INFO:asyncio:End query 'incategory:Programming_languages': 06:03:40