DEV Community

Ashwin Shenoy
Ashwin Shenoy

Posted on

Using the python urllib module for making web requests.

The urllib module is a standard module built into the Python standard library that allows the user to work with URL's (Uniform Resource Locators). You can do a lot of neat stuff with it e.g. access API resources, make different types of HTTP requests like GET, POST, PUT, DELETE etc.

In this tutorial, let's look at how we can use the urllib module to access API resources. We will be using JsonPlaceholder which is a really neat tool if you want to make dummy REST API calls for the purposes of learning.

Let's import it. You don't need to install anything & it works out of the box !

import urllib.request

Let's make a request to google.com to see what we get.

import urllib.request

with urllib.request.urlopen("https://www.google.com/") as url:
    print(url.read(300))

Here we are using a context manager which handles opening & closing of the url gracefully. The urlopen() function returns a byte object and we print the first 300 bytes off it.

b'<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en-IN"><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title><script nonce="j2RUvuN1fVmpA'
[Finished in 7.3s]

We can print the string result if we know the encoding of the bytes object. It's plainly visible in the meta tag of the above output that the encoding is utf-8.

import urllib.request

with urllib.request.urlopen("https://www.google.com/") as url:
    print(url.read(300).decode("utf-8"))

The above code now returns a utf-8 decoded bytes object as a string, which we can later parse if we need to do something with it.

<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en-IN"><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title><script nonce="MnpsG49CZsHqd
[Finished in 2.0s]

GET Request.

The GET request is used to retrieve a specific resource from a URL endpoint. Let's make a GET request to the JsonPlaceholder API to fetch some todos.

import urllib.request

req = urllib.request.Request(url = 'https://jsonplaceholder.typicode.com/todos/1')

with urllib.request.urlopen(req) as resp:
    print(resp.read().decode('utf-8'))

Here we pass the url parameter to the Request method defined on urllib.request. We then read the response object and decode it. If you closely inspect the code, we are not specifying anywhere that we intend to make a GET request to the url. The Request method actually takes in several arguments. One of them is the method parameter which is a string that indicates the HTTP request method that will be used. It also takes in a data parameter (which we will talk about when we see POST requests). Since we din't specify any data parameter (which essentially means it is None), the method defaults to GET.

{
  "userId": 1,
  "id": 1,
  "title": "delectus aut autem",
  "completed": false
}
[Finished in 4.5s]

Here the output actually looks like a python dictionary with key-value pairs but the reality is that it is just a string. You can verify it by wrapping the decoded response object within type() and see that its of type string.

If we need to access the various key value pairs of this response object, we need to parse this by using the json.loads() method for which we need the json module (again another useful module of the standard library).

import json
import urllib.request

# We can now control the number of todos we want to be returned.
num_of_todos = 1

req = urllib.request.Request(url=f'https://jsonplaceholder.typicode.com/todos/{num_of_todos}')

with urllib.request.urlopen(req) as resp:
    data = json.loads(resp.read().decode("utf-8"))
    print(data)
    print(type(data))
    print(data["title"])

I have now refactored the code a bit so that we can have some control over the number of todos we can request for (using an f-string).

Now the data which we get back is indeed a python dictionary and we can now access the key value pairs as we normally do.

{'userId': 1, 'id': 1, 'title': 'delectus aut autem', 'completed': False}
<class 'dict'>
delectus aut autem
[Finished in 1.7s]

POST Request.

We use the POST method when we want to send some data to the server to create or update a resource. Let's see how to POST a todo to the API.

import json
import urllib.request

# This is our Todo
data = {
    "userId": 101,
    "id": 100,
    "title": "This is a POST request",
    "completed": True
}

# Dump the todo object as a json string
data = json.dumps(data)

req = urllib.request.Request(url = 'https://jsonplaceholder.typicode.com/todos/', data = bytes(data.encode("utf-8")), method = "POST")

# Add the appropriate header.
req.add_header("Content-type", "application/json; charset=UTF-8")

with urllib.request.urlopen(req) as resp:
    response_data = json.loads(resp.read().decode("utf-8"))
    print(response_data)

There is a lot going on here but let's digest it line by line. In a POST request, we need some data to be sent. In our case it is a todo that we need to send. So we make a python dictionary with the necessary key-value pairs. We already know how a todo looks like. The next step is to dump it as a JSON string. Now we pass the data as a bytes object to the data parameter of the Request method after encoding it. We then add the Content-type header as application/json, which indicates that the data we wish to post is a JSON object. We then read and decode the response object and print it.

{'userId': 101, 'id': 201, 'title': 'This is a POST request', 'completed': True}
[Finished in 2.0s]

We get back the same data we sent. One thing to note is that since JsonPlaceholder is a fake REST API, the todo doesn't actually get created in the server but is "faked". So if you actually access this todo by doing a GET request as explained in the previous section, it will return a 404 error.

PUT request.

We use a PUT request if we need to modify a certain resource on the server. Let's see how to do that

import json
import urllib.request

# This is our Todo
data = {
    "userId": 1,
    "id": 1,
    "title": "This is a PUT request",
    "completed": False
}

# Dump the todo object as a json string
data = json.dumps(data)

req = urllib.request.Request(url = 'https://jsonplaceholder.typicode.com/todos/1', data = bytes(data.encode("utf-8")), method = "PUT")

# Add the appropriate header.
req.add_header("Content-type", "application/json; charset=UTF-8")

with urllib.request.urlopen(req) as resp:
    response_data = json.loads(resp.read().decode("utf-8"))
    print(response_data)

So, much of the code remains the same. The only difference is that we now specify the method as PUT and specify which todo is it that we want to modify (which we assume to be /todos/1).

{'userId': 1, 'id': 1, 'title': 'This is a PUT request', 'completed': False}
[Finished in 2.0s]

DELETE Request

We use the DELETE request, if we want to delete a specific resource on the server.

import json
import urllib.request

req = urllib.request.Request(url = 'https://jsonplaceholder.typicode.com/todos/1', method = "DELETE")

with urllib.request.urlopen(req) as resp:
    response_data = json.loads(resp.read().decode("utf-8"))
    print(response_data)

Here, we just specify the method as DELETE and the todo which we want to delete (which is /todos/1).

{}
[Finished in 2.6s]

The response is an empty object, which means we have successfully deleted the todo. (Again remember this is all "faked").

We have basically just scratched the surface with what we can do with the urllib module. Hopefully you have some idea how to access API resources using urllib. There are more powerful libraries like requests & urllib3 which I would definitely like to cover in a future article but for the sake of simplicity, I wanted to keep this article as lean as possible without having to install a 3rd party library as such plus give you an idea of the amount of stuff you can do with just the standard python library itself.

See you in another article soon 👋

Top comments (0)