Jonathan Bowman

Posted on Mar 7, 2021 • Edited on Feb 9 • Originally published at bowmanjd.com

HTTP Calls in Python Without Requests or Other External Dependencies

#python #webdev #programming #tutorial

In addition to great Python HTTP client tools such as Requests and HTTPX, the standard library itself supplies the necessary ingredients to make a working HTTP client for API calls. This tutorial shares how to construct and customize such a tool for your own scripts.

Consider installing a library

Before proceeding, I should note that in many cases, the approach in this article is not best practice. Instead, I highly recommend using a third-party Python library for features, security, and reliability.

Some suggested libraries:

urllib3 is the dependency for many other tools, including requests. By itself, urllib3 is quite usable. It may be all you need.
requests is ubiquitous and well documented.
HTTPX has an interface almost identical to requests, but with the added benefit of asyncio support. You may be interested in a series of articles I wrote on using HTTPX both synchronously and asynchronously.
pycurl is less popular as a Python library, but interfaces with the well-known libcurl.
aiohttp has an asyncio-based HTTP client that is well-documented and well-liked.

If, however, you find yourself needing a solution that does not require external dependencies other than what is already available in the Python standard library, then you may wish to read on.

Summary code

import json
import typing
import urllib.error
import urllib.parse
import urllib.request
from email.message import Message


class Response(typing.NamedTuple):
    body: str
    headers: Message
    status: int
    error_count: int = 0

    def json(self) -> typing.Any:
        """
        Decode body's JSON.

        Returns:
            Pythonic representation of the JSON object
        """
        try:
            output = json.loads(self.body)
        except json.JSONDecodeError:
            output = ""
        return output


def request(
    url: str,
    data: dict = None,
    params: dict = None,
    headers: dict = None,
    method: str = "GET",
    data_as_json: bool = True,
    error_count: int = 0,
) -> Response:
    if not url.casefold().startswith("http"):
        raise urllib.error.URLError("Incorrect and possibly insecure protocol in url")
    method = method.upper()
    request_data = None
    headers = headers or {}
    data = data or {}
    params = params or {}
    headers = {"Accept": "application/json", **headers}

    if method == "GET":
        params = {**params, **data}
        data = None

    if params:
        url += "?" + urllib.parse.urlencode(params, doseq=True, safe="/")

    if data:
        if data_as_json:
            request_data = json.dumps(data).encode()
            headers["Content-Type"] = "application/json; charset=UTF-8"
        else:
            request_data = urllib.parse.urlencode(data).encode()

    httprequest = urllib.request.Request(
        url, data=request_data, headers=headers, method=method
    )

    try:
        with urllib.request.urlopen(httprequest) as httpresponse:
            response = Response(
                headers=httpresponse.headers,
                status=httpresponse.status,
                body=httpresponse.read().decode(
                    httpresponse.headers.get_content_charset("utf-8")
                ),
            )
    except urllib.error.HTTPError as e:
        response = Response(
            body=str(e.reason),
            headers=e.headers,
            status=e.code,
            error_count=error_count + 1,
        )

    return response

You are certainly welcome to copy and use the above function, or browse or clone the Github repo.

If, however, you are reading this article for a do-it-yourself approach, I encourage you to build your own function that suits your needs. It may grow simpler or more flexible than the above.

Let's discuss the building blocks.

An introduction to `urllib.request.urlopen()`

The recommended high-level function for HTTP requests is urlopen(), available in the standard urllib.request module.

Unlike the lower-level http.client module, urlopen() provides error handling, follows redirects, and provides convenience around headers and data.

An example:

from urllib.request import urlopen, Request

url = "https://jsonplaceholder.typicode.com/posts/1"

if not url.startswith("http"):
    raise RuntimeError("Incorrect and possibly insecure protocol in url")

httprequest = Request(url, headers={"Accept": "application/json"})

with urlopen(httprequest) as response:
    print(response.status)
    print(response.read().decode())

I highly appreciate and recommend JSONPlaceHolder's free fake API, used above. Useful precisely for what we are doing here: testing HTTP clients intended for API work.

Please note the security measures in the above code. Before passing a URL to urlopen(), make sure that it is a web url and not a local file ("file:///"). If you want a wake-up call, try urlopen("file:///etc/passwd").read() on a Linux system (not in production code, though!) and see what happens. Of course, this protocol check is only necessary if the URL comes from user input. If you control the URL string and can assure that it does not start with "file:" then that is a good thing. You may also be interested in an another approach to hardening urlopen() by redefining the list of protocol handlers.

Protocol checking aside, the urlopen() call is fairly simple, as you can see in the above example. I recommend using it alongside with in a context manager for tidiness, so that closing the response is handled automatically.

We passed a Request object to the urlopen() function. While we could simply pass a URL string, the Request object offer much more flexibility: we can specify HTTP method (GET, POST, PUT, HEAD, DELETE), request headers, and request data.

The response returned by urlopen has 4 useful attributes:

It has a file-like interface that can be read(), returning bytes
url
status returns the HTTP status code
headers returns an EmailMessage object. This functions somewhat like a dict but with case-insensitive keys. It also has some helpful methods such as get_content_type() and get_content_charset(). The get_all() method is another useful one, for when there may be multiple key/value pairs for the same header name. See the helpful Wikipedia article for a list of possible reponse headers.

HTTP errors

Out of the box, urlopen handles redirects (status codes 301, 302, 303, or 307). Other than these codes, though, if the status code is not between 200 and 299 (HTTP "OK" codes according to RFC 2616) then an HTTPError exception is raised.

The HTTPError can be captured and analyzed with the appropriate try... except... block, such as this:

from urllib.error import HTTPError
from urllib.request import urlopen

try:
    urlopen("https://github.com/404")
except HTTPError as e:
    print(e.status)
    print(e.reason)
    print(e.headers.get_content_type())

The error (assigned to the "e" variable in the above) has the following useful properties:

status to get the error code (such as 404)
headers as an EmailMessage object. Again, this can be treated like a case-insensitive dict.
reason with the text of the error

In the function at the top of this article, I catch and silence all errors, to make the response uniform, and pass error-handling responsibility downstream. However, this may not be desirable. Perhaps, instead of continuing no matter the error, you want to fail on anything other than a 401 or 429 error:

except urllib.error.HTTPError as e:
    if e.code in (401, 429):
        response = Response(
            body=str(e.reason),
            headers=e.headers,
            status=e.code,
            error_count=error_count + 1,
        )
    else:
        raise e

Of course, logic could be added to deal with errors as appropriate, depending on the status code.

Note the entirely optional auto-incrementing error_count attribute in my code. Sometimes, I wish to call the http request function recursively. This allows the number of calls to be tracked, and dealt with downstream, hopefully preventing infinite recursion. For instance, I may want to catch 401 errors, parse the "Www-Authenticate" header for a token, then retry the request with the token. But if this fails repeatedly (say, 5 tries), it should stop. I could test for error_count >= 5 and raise and exception if so, meanwhile making sure to pass the current error_count back to the request function as a parameter, so it continues to be incremented appropriately.

An alternative way to customize error handling is to construct your own subclasses of BaseHandler, then build an OpenerDirector chain of handlers as appropriate. For instance, you could subclass BaseHandler and add a method http_error_401 to handle authorization as desired, then pass an instance of that custom class to build_opener(). Obviously, this requires a deeper dive into the opener innards.

A versatile `Response` object

I find it helpful to create a Python class that can contain the bits of the HTTP response that I care about. This could be a dict, but I like to add a method or two, such as a JSON decoder.

If using Python 3.7 or later, consider using a dataclass. An example:

@dataclass(frozen=True)
class Response():
    body: str
    headers: Message
    status: int
    error_count: int = 0

    def json(self) -> typing.Any:
        """
        Decode body's JSON.

        Returns:
            Pythonic representation of the JSON object
        """
        try:
            output = json.loads(self.body)
        except json.JSONDecodeError:
            output = ""
        return output

Enabling frozen is strictly optional, and reflects my preference for this object being immutable (attributes can't be changed after initialization).

Another option, as demonstrated in the code at the beginning of the article, is a typed NamedTuple. I chose this for its immutability, ease of setup, and backwards compatibility.

Of course, a custom class will work, or attrs, or whatever container works for you.

Requests with data

Depending on the API with which you are interfacing, you may encounter various scenarios for accepting data. In each scenario, we can start with a Python dict and convert it into the required format.

The query string

Sometimes, data is passed in through the query string. To encode a Python dict as a query string that can be appended to the URL (after the "?"), use urllib.parse.urlencode:

from urllib.parse import urlencode
from urllib.request import urlopen

url = "https://jsonplaceholder.typicode.com/posts"
params = {"userId": 1, "_limit": 3}
url += "?" + urlencode(params, doseq=True, safe="/")

with urlopen(url) as response:
    print(response.read().decode())

While not relevant to the above request, I did pass two parameters to urlencode that I have found helpful:

doseq will allow lists to be encoded as multiple parameters. For instance, if we passed in {"usernames": ["John Doe", "Jane Doe"]}, then the end result would be "usernames=John+Doe&usernames=Jane+Doe".
safe defines the characters that will not be url-encoded. In some APIs I encounter, such as the Docker API, it is better to leave slashes unencoded, so I added that to the safe string. Adapt as you see fit.

Sending data in the request body

Similarly, data can be encoded with urllib.parse.urlencode and then passed into the Request object via the data parameter:

from urllib.parse import urlencode
from urllib.request import Request, urlopen

url = "https://api.funtranslations.com/translate/yoda.json"
data = {"text": "HTTP POST calls are remarkably easy"}

postdata = urlencode(data).encode()

httprequest = Request(url, data=postdata, method="POST")

with urlopen(httprequest) as response:
    print(response.read().decode())

Sending JSON in the request body

Many APIs accept and even require the request parameters to be sent as JSON. In these cases, it is important to first encode the Python dict (or other object) as JSON, then set the "Content-Type" request header appropriately:

import json
from urllib.parse import urlencode
from urllib.request import Request, urlopen

url = "https://jsonplaceholder.typicode.com/posts"
data = {
    "userid": "1001",
    "title": "POSTing JSON for Fun and Profit",
    "body": "JSON in the request body! Don't forget the content type.",
}

postdata = json.dumps(data).encode()

headers = {"Content-Type": "application/json; charset=UTF-8"}

httprequest = Request(url, data=postdata, method="POST", headers=headers)

with urlopen(httprequest) as response:
    print(response.read().decode())

In the above, we used Python's built-in JSON module to dump the data dict into a string, then encode it to bytes so it could then handled as POST data.

We set the content type header to application/json. In addition, we specified the character encoding as UTF-8. Given that UTF-8 is the required JSON character encoding, this is redundant and probably unnecessary, but it never hurts to be explicit.

Parsing JSON in the response body

Because most APIs I use return JSON, and some return other formats such as XML unless JSON is specified, I typically set the Accepts header in the request to application/json. If you are pulling other types of data, such as text/csv, you would want to tweak that header. Set it to */* if you don't care.

In the Response object we created earlier, there is an example of a JSON decoder, the result of which will likely be a Python dict or list, but could conceivably be a string or boolean, depending on what the server returns. Here is another example of similar functionality, but loading the JSON directly from the file-like response:

import json
from urllib.parse import urlencode
from urllib.request import Request, urlopen

url = "https://jsonplaceholder.typicode.com/posts?_limit=3"

with urlopen(url) as response:
    try:
        jsonbody = json.load(response)
    except json.JSONDecodeError:
        jsonbody = ""

print(jsonbody)

In the above example I decided to fail silently, offering an empty string when JSON decoding fails. If this is not desired, just use json.load() (or json.loads() from a string) and let any exceptions float up as they occur.

Other tricks or suggestions?

I am very curious if you use urlopen and how. Are there optimizations to the above that I am missing? Does this raise any questions or confusion? Feel free to post in the comments.

Top comments (4)

Amal Shaji • Mar 8 '21

Great article!!

Can we have def json(self) -> Optional[Dict] instead of def json(self) -> typing.Any in Response class?

Jonathan Bowman • Mar 8 '21

You certainly can, as long as you are 100% sure that all your API calls will only return a JSON object, not a number, string, boolean, or null. Since JSON supports all those types, one cannot be assured that only a JSON object (the equivalent of a Python dict) will be returned.

I know of several APIs that may return an array instead of just an object, so I would be somewhat concerned with this approach. But if you know or control the endpoints and return values, then you can define this how you would like.