DEV Community: Leonardo Giordani

Mau: a lightweight markup language based on Jinja

Leonardo Giordani — Wed, 17 Aug 2022 14:37:34 +0000

Mau is a lightweight markup language heavily inspired by AsciiDoc that makes it a breeze to write blog posts or books. If you already know Markdown or AsciiDoc you already know which type of software Mau is, and you will quickly learn its syntax.

The main goal of Mau, however, is to provide a customisable markup language. While Mau's syntax is fixed by its implementation, its output is created through user-provided templates. This strategy gives the user great flexibility with no added complexity.

I currently use Mau to write posts for my blog The Digital Cat and to write my book Clean Architectures in Python. As you can see, the system is production-ready, and you can start using it today.

The story so far

Markdown is a great format, and I used it for all my blog posts since I started writing. Pelican, which is the static site generator that I use, supports Markdown out of the box, so it was extremely easy to start using it, and overall I had an enjoyable experience.

When the idea of a book about the clean architecture began to take shape in my mind, a quick survey of the platforms for self-publishing led me to LeanPub, which provides a good tool chain based on their Markdown dialect called Markua. Being so similar to Markdown, the transition was seamless for me, and I could publish the first edition of the book without any issues.

In the meanwhile, my activity on the blog increased, and I started to feel the need to add features to my articles that weren't easily created with Markdown, such as adding a file name and callouts to the code blocks or adding admonitions. Sure, such things can be added using raw HTML, but that popped the bubble of the simple markup syntax, so I wasn't happy with that solution.

The same problems arose when I started working on the second version of the book, with some additional concerns. Since the book is freely available, I wanted to use the same source code to generate a website and be able to reuse the same features both in the resulting HTML and in the PDF.

I couldn't find a good way to create tips and warnings using Markdown. Recently, Python Markdown added a feature that allows specifying the file name for the source code, but the resulting HTML cannot easily be changed, making it difficult to achieve the graphical output I wanted through CSS. So, I started looking into other projects.

I tried Pandoc, and a week spent trying to learn again that black magic called TeX was enough for me to decide that the system wasn't what I needed. My relationship with TeX/LaTeX has always been stormy: while I admire the system, the ingenuity, and the one-man show effort behind TeX, the final result is a convoluted beast that is difficult to tame. It is also terribly undocumented!

The third system that I found was AsciiDoc, which started as a Python project, abandoned for a while and eventually resurrected by Dan Allen with Asciidoctor. AsciiDoc has a lot of features and I consider it superior to Markdown, but Asciidoctor is a Ruby program, and this made it difficult for me to use it. In addition, the standard output of Asciidoctor is a nice single HTML page but again customising it is a pain. I eventually created the site of the book using it, but adding my Google Analytics code and a sitemap.xml to the HTML wasn't trivial, not to mention customising the look of elements such as admonitions.

In the end, I wasn't completely happy with Asciidoctor, and once again I started looking around to see if there was something that matched my requirements.

== What I was looking for

In a nutshell, this is what I was hoping to find:

A simple markup syntax [Markdown, Markua, Asciidoctor]
A stand-alone implementation that I can run locally [Markdown, Asciidoctor]
A Python implementation that can be used from Pelican [Markdown]
Support for admonitions and callouts [Asciidoctor]
PDF output [Asciidoctor]
Highly configurable HTML output []

As you can see none of the systems could tick all the boxes, and all of them are missing a way to easily change the output of the rendering.

== What I did

Since no existing tool was matching my requirements I did what people like me do when they lack a tool. I wrote it myself!

I have been studying compilers all my life, even though I can by no means be called an expert. I have a series of posts on my blog where I write an interpreter in Python, based on the amazing work of Ruslan Spivak, so I thought that I might try to create a Python interpreter for Asciidoctor's syntax since the original AsciiDoc code was left unmaintained (apparently development started again later).

After one month I had a working tool that I successfully connected with Pelican and used to render some posts that I had already written in Markdown. I don't consider the project revolutionary, but I can honestly say that the day I saw Mau working for the first time is one of the best days of my career as a software developer.

At that point, Mau had already slightly diverged from the original idea, though.

While initially I was aiming to an implementation of AsciiDoctor's syntax, and retained a great deal of it, I took the opportunity to try a different path when it came to rendering. Having already successfully used Jinja in other contexts, I had this idea of using Jinja templates to render Mau's output, so that the user could either use the standard templates or provide their own and thus easily customise the final result.

I later wrote a visitor (a rendering class) that converts Mau's input into AsciiDoctor or Markua, and even though it doesn't cover all the features of the two languages, it allowed me to use Mau to rewrite my book and publish it online while using the Markua output to feed Leanpub's processing chain that produces the PDF.

== Where we are now

The short story is that Mau works, and as I already mentioned is used for both my blog and my books. Mau's features are

A simple markup syntax
A stand-alone implementation that you can run locally on any system that supports Python3
A plugin for Pelican that allows you to use Mau to write blog posts and website pages
Full support for a good range of standard HTML features (paragraphs, lists, headers, ...) and for some advanced ones such as admonitions, code callouts, includes, and footnotes.
Extremely configurable output using Jinja templates

It's missing stand-alone PDF creation, but this might come in the future.

I learned a lot writing Mau, and I'm happy that the whole idea proved worth the time I invested. I'd love to know that other people found it useful!

You can learn how to install and use Mau reading the Mau book (which is clearly written in Mau), and having a look at the source code.

Thanks for giving Mau a try!

Public key cryptography: OpenSSH private keys

Leonardo Giordani — Mon, 11 Apr 2022 12:13:33 +0000

When you create standard RSA keys with ssh-keygen you end up with a private key in PEM format, and a public key in OpenSSH format. Both have been described in detail in my post Public key cryptography: RSA keys. In 2014, OpenSSH introduced a custom format for private keys that is apparently similar to PEM but is internally completely different. This format is used by default when you create ed25519 keys and it is expected to be the default format for all keys in the future, so it is worth having a look.

While investigating this topic I found a lot of misconceptions and wrong or partially wrong statements on Stack Overflow, so I hope this might be a comprehensive view of what this format is, its relationship with PEM, and the tools that you can use to manipulate it.

I'm not the first programmer to look into this, clearly, and I have to mention two posts that I read before writing this one: OpenSSH ed25519 private key file format written in December 2017 by Peter Lyons and The OpenSSH private key binary format, written in August 2020 by Marin Atanasov Nikolov. I'm sure many others have done this research but these are the resources that I found and I want to say a big thanks to both authors for sharing their findings. I will shamelessly use their results in the following explanation, as I hope others will do with what I'm writing here. Sharing knowledge is one of the best ways to help others.

Please note that all the private keys shown in this post have been trashed after I published it.

Note: as the word "key" can identify several different component of the systems I will describe, I will as much as possible use the words "private key" and "encryption key". The first is the key that we generate to be used in SSH, while the second is a parameter of a (symmetric) encryption algorithm.

KDFs and protection at rest

Describing the introduction of the new format, the OpenSSH changelog says

Add a new private key format that uses a bcrypt KDF to better
protect keys at rest. This format is used unconditionally for
Ed25519 keys, but may be requested when generating or saving
existing keys of other types via the -o ssh-keygen(1) option.
We intend to make the new format the default in the near future.
Details of the new format are in the PROTOCOL.key file.

Before we start dissecting the format, then, it is worth briefly discussing what a KDF is, what bcrypt is, and what it means to protect keys at rest.

Key Derivation Functions

Whenever a system is protected by a password you want to store the latter somewhere. This is clearly necessary to check the validity of the passwords that the user inputs and decide if you should grant access, but you shouldn't store the password in clear text, as a breach in the storage might compromise the whole system. The idea behind storing password securely is to run them through a hash function and store the hash: whenever someone inputs a password we can run the hash function again and compare the two hashes. However, we also want to prevent the attacker to be able to reconstruct the password from the hash, so we need a cryptographic hash function, which is a hash function with added requirements to prevent an easy inversion of the process.

The same strategy can be applied when it comes to encryption. An encryption system needs a key (a sequence of bits used to encrypt the message) and we need to derive it from the password given by the user. Encryption keys are required to have a specific length dictated by the encryption algorithm that we use, so hashing looks like a good solution, as all hashes generated by a given algorithm are by definition of the same size. AES, for example, one of the most widespread symmetric block ciphers, uses a key of exactly 128, 192, or 256 bits. Converting the password into a key of predetermined size is called stretching.

Any cryptographic system can be broken using a brute-force attack, as you can always test all possible inputs. In the case of login, we can just input all possible passwords until we get access to the system, while in the case of encryption we can try to decrypt using all possible keys until we obtain a meaningful result. This means that the most important thing we can do to protect such systems is to make brute-force attacks infeasible. This can be done increasing the key size (using more bits) but also using a slow stretching algorithm.

While hash functions created for things like digital signatures should be fast, then, hash functions that we use to obfuscate the password (for storage) or to create the key (for encryption/decryption) have to be very slow. The slowness of the processing can frustrate brute-force attacks and make them less effective is not infeasible. An example: at the current state of technology, you can easily hash 1 trillion passwords a second with a trivial expense, but if each one of those hashes takes 1 second you end up having to wait more than 31,000 years before you test all of them.

The process that converts a password into a key is called Key Derivation Function (KDF) and despite the name it is usually a complex algorithm and not a single mathematical function. PBKDF2 is an important KDF, defined as part of the specification PKCS #5, and it can use any pseudorandom function as part of the key stretching. An important feature of PBKDF2 is that it accepts an iteration count as input, that allows to slow down the process. As we just saw, this is the key to making the algorithm slower in order to adapt to the increasing computing power available to attackers.

bcrypt

The password-hashing function known as bcrypt was created in 1999 and is based on the Blowfish cipher created in 1993. Bcrypt is well know to be an extremely good choice thanks to the simple fact that its slowness can be increased tuning one of the parameters of the algorithm called "cost factor". This represents the number of iterations done in the setup of the underlying cipher, and its logarithmic nature makes easy to adapt the whole process to the increasing computational power available to attackers. This post attempts to estimate the time to hash a password of 15 characters with a cost of 30 (the maximum is actually 31) with a decent 2017 laptop (2.8 GHz Intel Core i7 16 GB RAM). The result turns out to be around 500 days which makes you understand that bcrypt won't die easily.
It is important to note here that bcrypt is not a KDF, but a hash function. As such, it might be part of a KDF, but not replace the whole process.

Protection at rest

Protection at reat refers to the scheme that ensures data is secure when it is stored. Practically speaking, when it comes to SSH keys, we refer to the fact that an attacker that can physically access a key, for example stealing a laptop, actually owns an encrypted version of the key, which can't be used without first decrypting it. As the attacker is supposed to ignore the password used to encrypt the key, the only strategy they can use is to brute-force the key, and here is where the concept of protection at rest comes into play. Actually, the other strategy they can employ is to kidnap you and to force you to reveal the password, but this somehow falls outside the sphere of cryptographic security.

PEM format and protection at rest

Now that I clarified some terminology, let's have a look at what the standard PEM format does to store encrypted passwords. As I explained in my post Public key cryptography: RSA keys a PEM file contains a text header, a text footer, and some content. The content is always an ASN.1 structure created using DER and encoded using base64.

For encrypted private keys, the ASN.1 structure is created following a standard called PKCS #8. This standard uses an encryption scheme called PBES2 described in the specification PKCS #5, which uses a symmetric cipher and a password, previously converted into an encryption key using the KDF called PBKDF2. I hope at this point some if not all of these names ring a bell.

We can roughly sketch the process with the following steps:

Create the private key using the requested asymmetric algorithm (e.g. RSA or ED25519)
Encrypt the private key following PBES2
- Stretch the password into an encryption key using PBKDF2 with one of the possible hash functions and a random salt value
- Encrypt the private key using the newly created encryption key
Represent the encrypted key and the parameters used for PBKDF2 using ASN.1/DER
Encode the result with base64
Add a header and a footer that specify the nature of the content

Let's create an encrypted key with OpenSSL and analyse it. The command I used is

$ openssl genpkey -aes-256-cbc -algorithm RSA\
    -pkeyopt rsa_keygen_bits:4096 -pass pass:foobar\
    -out key_rsa_4096_openssl_pw

which creates a 4096 bits RSA key and encrypts it with AES using foobar as password. What I get is a file in the aforementioned PEM format

We can dump the ASN.1 content directly from the PEM format using openssl asn1parse

$ openssl asn1parse -inform pem -in key_rsa_4096_openssl_pw
    0:d=0  hl=4 l=2477 cons: SEQUENCE          
    4:d=1  hl=2 l=  87 cons: SEQUENCE          
    6:d=2  hl=2 l=   9 prim: OBJECT            :PBES2 :1:
   17:d=2  hl=2 l=  74 cons: SEQUENCE          
   19:d=3  hl=2 l=  41 cons: SEQUENCE          
   21:d=4  hl=2 l=   9 prim: OBJECT            :PBKDF2 :2:
   32:d=4  hl=2 l=  28 cons: SEQUENCE          
   34:d=5  hl=2 l=   8 prim: OCTET STRING      [HEX DUMP]:5BE04AE9442D08F0 :4:
   44:d=5  hl=2 l=   2 prim: INTEGER           :0800 :5:
   48:d=5  hl=2 l=  12 cons: SEQUENCE          
   50:d=6  hl=2 l=   8 prim: OBJECT            :hmacWithSHA256 :6:
   60:d=6  hl=2 l=   0 prim: NULL              
   62:d=3  hl=2 l=  29 cons: SEQUENCE          
   64:d=4  hl=2 l=   9 prim: OBJECT            :aes-256-cbc :3:
   75:d=4  hl=2 l=  16 prim: OCTET STRING      [HEX DUMP]:88BD4E050F7D6691847BEAE813121BB0
   93:d=1  hl=4 l=2384 prim: OCTET STRING      [HEX DUMP]:93C719E39B382D[...]

Please note that I truncated the final OCTET STRING that contains the encrypted key as it is pretty long.

You can clearly see that this key is encrypted using PBES2 and PBKDF2. The algorithm used to encrypt the key is aes-256-cbc, as I asked. Specifically, this is AES with a key of 256 bits in CBC mode.

According to the PKCS #5 specification, the PBES2 block contains

PBES2-params ::= SEQUENCE {
       keyDerivationFunc AlgorithmIdentifier {{PBES2-KDFs}},
       encryptionScheme AlgorithmIdentifier {{PBES2-Encs}} }

and indeed we have PBKDF2 for keyDerivationFunc, and aes-256-cbc for encryptionScheme. The sequence PBKDF2 is specified in the same document as

PBKDF2-params ::= SEQUENCE {
       salt CHOICE {
           specified OCTET STRING,
           otherSource AlgorithmIdentifier {{PBKDF2-SaltSources}}
       },
       iterationCount INTEGER (1..MAX),
       keyLength INTEGER (1..MAX) OPTIONAL,
       prf AlgorithmIdentifier {{PBKDF2-PRFs}} DEFAULT
       algid-hmacWithSHA1 }

As you can see in the ASN.1 dump the salt is 5BE04AE9442D08F0, the iteration count is 2048 (0x800), and the hash function (prf, pseudorandom function) is hmacWithSHA256 without any additional parameters. The value 2048 for the iterations is a default value in OpenSSL (see the definition of PKCS5_DEFAULT_ITER).

OpenSSH's private key format

As we saw at the beginning of the post, the OpenSSH team came up with a custom format to store the private keys, so now that we are familiar with the nomenclature and with the way PEM stores encrypted keys, lets see what this new format can do.

The best starting point for our investigation is the tool ssh-keygen which we can use to create private keys. The source can be found in the OpenSSH repository in the file ssh-keygen.c. This file uses two different functions, sshkey_private_to_blob2 (source code) for the new format and sshkey_private_to_blob_pem_pkcs8 (source code) for keys in PKCS #8 format. The former calls bcrypt_pbkdf which comes from OpenBSD (source code).

This function contains a modified implementation of PBKDF2 that uses bcrypt as the core hash function. The comment that you can find at the top of the file bcrypt_pbkdf.c says

/*
 * pkcs #5 pbkdf2 implementation using the "bcrypt" hash
 *
 * The bcrypt hash function is derived from the bcrypt password hashing
 * function with the following modifications:
 * 1. The input password and salt are preprocessed with SHA512.
 * 2. The output length is expanded to 256 bits.
 * 3. Subsequently the magic string to be encrypted is lengthened and modified
 *    to "OxychromaticBlowfishSwatDynamite"
 * 4. The hash function is defined to perform 64 rounds of initial state
 *    expansion. (More rounds are performed by iterating the hash.)
 *
 * Note that this implementation pulls the SHA512 operations into the caller
 * as a performance optimization.
 *
 * One modification from official pbkdf2. Instead of outputting key material
 * linearly, we mix it. pbkdf2 has a known weakness where if one uses it to
 * generate (e.g.) 512 bits of key material for use as two 256 bit keys, an
 * attacker can merely run once through the outer loop, but the user
 * always runs it twice. Shuffling output bytes requires computing the
 * entirety of the key material to assemble any subkey. This is something a
 * wise caller could do; we just do it for you.
 */

As you can see, this is intended to be a pkcs #5 pbkdf2 implementation that uses bcrypt as its underlying hash function. It also mentions some modifications, and it's worth noting that when you modify a standard you are not following the standard any more. I won't run through all the details of the implementation, though, as it's beyond the scope of the post.

So, the OpenSSH private key format ultimately contains a private key encrypted with a non-standard version of PBKDF2 that uses bcrypt as its core hash function. The structure that contains the key is not ASN.1, even though it's base64 encoded and wrapped between header and footer that are similar to the PEM ones. A description of the structure can be found in PROTOCOL.key.

Cost factor and rounds

PBKDF2 uses the concept of rounds to make the key stretching slower. This is the number of times the hash function is called internally (using as salt the output of the previous iteration), so in PBKDF2 the number of rounds or iterations is directly proportional to the slowness of the stretching operation.

Bcrypt implements a similar mechanism with its cost factor. The cost factor in the standard bcrypt implementation is defined as the binary logarithm of the number of iterations of a specific part of the process (the repeated expansion of the password and the salt). Using the binary logarithm means that a cost factor of 4 (the minimum) corresponds to 16 iterations, while 31 (the maximum) corresponds to 2,147,483,648 (more than 2 billion) iterations.

In the OpenSSH/OpenBSD implementation things are a bit different.

OpenBSD's version of bcrypt runs with a fixed cost of 6, that creates 64 iterations of the key expansion (source code), but being an implementation of PBKDF2 it can still be hardened increasing the number of rounds (source code). Those rounds correspond to the value given to the parameter -a of the ssh-keygen command line.

How many rounds?

When it comes to KDFs, the advice is always to run as much iterations as possible while keeping the specific application usable, so you need to tune your SSH keys testing different values in your system. To give you some rough estimations, Wikipedia mentions that for PBKDF2 the number of iterations used by Apple and Lastpass is between 2k and 100k. It is worth reiterating though that you shouldn't aim to use other people's figures, in this case. Instead, run tests of your software and hardware.

On my laptop, an i7-8565U with 32GiB of RAM running Kubuntu 20.04 I get the following results, which are pretty linear:

ssh-keygen -a 100 -t ed25519    0.667s
ssh-keygen -a 500 -t ed25519    3.148s
ssh-keygen -a 1000 -t ed25519   6.331s
ssh-keygen -a 5000 -t ed25519   31.624s

A sensible value for me might be between 100 and 500, then, so that I don't have to wait too long every time I push and pull my branches from GitHub.

Can we convert private OpenSSH keys into PEM?

As OpenSSL doesn't understand the OpenSSH private keys format, a common question among programmers and devops is if it is possible to convert it into a PEM format. As you might have guessed reading the previous sections, the answer is no. The PEM format for private keys uses PKCS#5, so it supports only the standard implementation of PBKDF2.

It's interesting to note that the OpenSSL team also specifically decided not to support this new format as it is not standard (see https://github.com/openssl/openssl/issues/5323).

A poorly documented format

PEM, PKCS #8, ASN.1, and all other formats that we use every day, included the OpenSSH public key format, are well documented and standardised in RFCs or similar documents. The OpenSSH private key format is documented in a tiny file that you can find in the source code, but doesn't offer more than a quick overview. To have a good understanding of what is going on I had to read the source code, not only of OpenSSH, but also of OpenBSD.

I think poor documentation like this might be acceptable in personal projects or in new tools, but SSH is used by the whole world, and when the team decides to come up with a completely new format for one of its most important elements I would expect them to detail every single bit of it, or at least try to be more open about the reasons and the implementation. I also personally believe that standards can't but benefit intercommunication between systems and, in cryptography, improve security, since they are reviewed and discussed by a wider audience.

The claim is that the new SSH private key format offers a better protection of keys at rest. I'd be very interested to see a cryptanalysis made by some expert (which I'm not). Cryptography is a tricky field, and often things that are apparently smart end up being tragically wrong.

Resources

OpenSSL documentation: asn1parse, genpkey
The Base64 encoding
The Abstract Syntax Notation One ASN.1 interface description language
RFC 4251 - The Secure Shell (SSH) Protocol Architecture
RFC 4253 - The Secure Shell (SSH) Transport Layer Protocol
RFC 4716 - The Secure Shell (SSH) Public Key File Format
RFC 5208 - PKCS #5: Password-Based Cryptography Specification Version 2.0
RFC 5208 - Public-Key Cryptography Standards (PKCS) #8: Private-Key Information Syntax Specification Version 1.2
RFC 5958 - Asymmetric Key Packages
RFC 7468 - Textual Encodings of PKIX, PKCS, and CMS Structures

Photo by Micah Williams on Unsplash

From Docker CLI to Docker Compose

Leonardo Giordani — Mon, 28 Mar 2022 10:40:31 +0000

In this post I will show you how and why Docker Compose is useful, building a simple application written in Python that uses PostgreSQL. I think it is worth going through such an exercise to see how technologies that we might be already familiar with actually simplify workflows that would otherwise definitely be more complicated.

The name of the demo application I will develop is a very unimaginative whale, that shouldn't clash with any other name introduced by the tools I will use. Every time you see something with whale in it you know that I am referring to a value that you can change according to your setup.

Before we start, please create a directory to host all the files we will create. I will refer to this directory as the "project directory".

PostgreSQL

Since the application will connect to a PostgreSQL database the first thing we can explore is how to run that in a Docker container.

The official Postgres image can be found here, and I highly recommend taking the time to properly read the documentation, as it contains a myriad of details that you should be familiar with.

For the time being, let's focus on the environment variables that the image requires you to set.

Password

The first variable is POSTGRES_PASSWORD, which is the only mandatory configuration value (unless you disable authentication which is not recommended). Indeed, if you run the image without setting this value, you get this message

$ docker run postgres
Error: Database is uninitialized and superuser password is not specified.
       You must specify POSTGRES_PASSWORD to a non-empty value for the
       superuser. For example, "-e POSTGRES_PASSWORD=password" on "docker run".

       You may also use "POSTGRES_HOST_AUTH_METHOD=trust" to allow all
       connections without a password. This is *not* recommended.

       See PostgreSQL documentation about "trust":
       https://www.postgresql.org/docs/current/auth-trust.html

This value is very interesting because it's a secret. So, while I will treat it as a simple configuration value in the first stages of the setup, later we will need to discuss how to manage it properly.

Superuser

Being a production-grade database, Postgres allows you to specify users, groups, and permissions in a fine-grained fashion. I won't go into that as it's usually more a matter of database administration and application development, but we need to define at least the superuser. The default value for this image is postgres, but you can change it setting POSTGRES_USER.

Database name

If you do not specify the value of POSTGRES_DB, this image will create a default database with the name of the superuser.

A note of warning here. If you omit both the database name and the user you will end up with the superuser postgres and database postgres. The official documentation states that

After initialization, a database cluster will contain a database named
postgres, which is meant as a default database for use by utilities,
users and third party applications. The database server itself does not
require the postgres database to exist, but many external utility programs
assume it exists.

This mean that it is not ideal to use that as the database for our application. So, unless you are just trying out a quick piece of code, my recommendation is to always configure all three values: POSTGRES_PASSWORD, POSTGRES_USER, and POSTGRES_DB.

We can run the image with

$ docker run -d \
  -e POSTGRES_PASSWORD=whale_password \
  -e POSTGRES_DB=whale_db \
  -e POSTGRES_USER=whale_user \
  postgres:13

As you can see I run the image in detached mode. This image is not meant to be interactive, as Postgres is by it's very nature a daemon. To connect in an interactive way we need to use the tool psql, which is provided by this image. Please note that I'm running postgres:13 only to keep the post consistent with what you will see if you read it in the future, you are clearly free to use any version of the engine.

The ID of the container is returned by docker run but we can retrieve it any time running docker ps. Using IDs is however pretty complicated, and looking at the command history is not immediately clear what you have been doing at a certain point in time. For this reason, it's a good idea to name the containers.

Stop the previous container and run it again with

$ docker run -d \
  --name whale-postgres \
  -e POSTGRES_PASSWORD=whale_password \
  -e POSTGRES_DB=whale_db \
  -e POSTGRES_USER=whale_user \
  postgres:13

Stopping containers

You can stop containers using docker stop ID. This gives containers a grace period to react to the SIGTERM signal, for example to properly close files and terminate connections, and then terminates it with SIGKILL. You can also force it to stop unconditionally using docker kill ID which sends SIGKILL immediately.

In either case, however, you might want to remove the container, that otherwise will be kept indefinitely by Docker. This can become a problem when containers are named, as you can't reuse a name that is currently assigned to a container.

To remove a container you have to run docker rm ID, but you can leverage the fact that both docker stop and docker kill return the ID of the container to pipe the termination and the removal

$ docker stop ID | xargs docker rm

Otherwise, you can use docker rm -f ID, which corresponds to docker kill followed by docker rm. If you name a container, however, you can use its name instead of the ID.

Now we can connect to the database using the executable psql provided in the image itself. To execute a command inside a container we use docker exec and this time we will specify -it to open an interactive session. psql uses by default the user name root, and the database with the same name as the user, so we need to specify both. The header informs me that the image is running PostgreSQL 13.5 on Debian.

$ docker exec -it whale-postgres psql -U whale_user whale_db
psql (13.5 (Debian 13.5-1.pgdg110+1))
Type "help" for help.

whale_db=#

Here, I can list all the databases with \l. You can see all psql commands and the rest of the documentation here.

$ docker exec -it whale-postgres psql -U whale_user whale_db
psql (13.5 (Debian 13.5-1.pgdg110+1))
Type "help" for help.

whale_db=# \l
                                    List of databases
   Name    |   Owner    | Encoding |  Collate   |   Ctype    |     Access privileges     
-----------+------------+----------+------------+------------+---------------------------
 postgres  | whale_user | UTF8     | en_US.utf8 | en_US.utf8 | 
 template0 | whale_user | UTF8     | en_US.utf8 | en_US.utf8 | =c/whale_user            +
           |            |          |            |            | whale_user=CTc/whale_user
 template1 | whale_user | UTF8     | en_US.utf8 | en_US.utf8 | =c/whale_user            +
           |            |          |            |            | whale_user=CTc/whale_user
 whale_db  | whale_user | UTF8     | en_US.utf8 | en_US.utf8 | 
(4 rows)

whale_db=#

As you can see, the database called postgres has been created as part of the initialisation, as clarified previously. You can exit psql with Ctrl-D or \q.

Postgres trust

You might be surprised by the fact that psql didn't ask for the password that we set when we run the container. This happens because the server trusts local connections, and when we run psql inside the container we are on localhost.

If you are curious about trust in Postgres you can see the configuration file with

$ docker exec -it whale-postgres \
  cat /var/lib/postgresql/data/pg_hba.conf

where you can spot the lines

# TYPE  DATABASE  USER  ADDRESS  METHOD

# "local" is for Unix domain socket connections only
local   all       all            trust

You can find more information about Postgres trust in the official documentation.

If we want the database to be accessible from outside we need to publish a port. The image exposes port 5432 (see the source code), which tells us where the server is listening. To publish the port towards the host system we can add -p 5432:5432. Please remember that exposing a port in Docker basically means to add some metadata that informs the user of the image, but doesn't affect the way it runs.

Stop the container (you can use its name now) and run it again with

$ docker run -d \
  --name whale-postgres \
  -e POSTGRES_PASSWORD=whale_password \
  -e POSTGRES_DB=whale_db \
  -e POSTGRES_USER=whale_user \
  -p 5432:5432 postgres:13

Running docker ps we can see that the container publishes the port now (0.0.0.0:5432->5432/tcp). We can double-check it with ss ("socket statistics")

$ ss -nulpt | grep 5432
tcp  LISTEN  0  4096  0.0.0.0:5432  0.0.0.0:*
tcp  LISTEN  0  4096     [::]:5432     [::]:*

Please note that usually ss won't tell you the name of the process using that port because the process is run by root. If you run ss with sudo you will see it

$ sudo ss -nulpt | grep 5432
tcp  LISTEN  0  4096  0.0.0.0:5432  0.0.0.0:*  users:(("docker-proxy",pid=1262717,fd=4))
tcp  LISTEN  0  4096     [::]:5432     [::]:*  users:(("docker-proxy",pid=1262724,fd=4))

Unfortunately, ss is not available on macOS. On that platform (and on Linux as well) you can use lsof with grep

$ sudo lsof -i -p -n | grep 5432
docker-pr 219643            root    4u  IPv4 2945982      0t0  TCP *:5432 (LISTEN)
docker-pr 219650            root    4u  IPv6 2952986      0t0  TCP *:5432 (LISTEN)

or directly using the option -i

$ sudo lsof -i :5432
COMMAND      PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
docker-pr 219643 root    4u  IPv4 2945982      0t0  TCP *:postgresql (LISTEN)
docker-pr 219650 root    4u  IPv6 2952986      0t0  TCP *:postgresql (LISTEN)

Please note that docker-pr in the output above is just docker-proxy truncated, matching what we saw with ss previously.

If you want to publish the container's port 5432 to a different port on the host you can just use -p ANY_NUMBER:5432. Remember however that port numbers under 1024 are privileged or well-known, which means that they are assigned by default to specific services (listed here).

This means that in theory you can use -p 80:5432 for your database container, exposing it on port 80 of your host. In practice this will result in a lot of headaches and a bunch of developers chasing you with spikes and shovels.

Now that we exposed a port we can connect to the database running psql in an ephemeral container. "Ephemeral" means that a resource (in this case a Docker container) is run just for the time necessary to serve a specific purpose, as opposed to "permanent". This way we can simulate someone that tries to connect to the Docker container from a different computer on the network.

Since psql is provided by the image postgres we can in theory run that passing the hostname with -h localhost, but if you try it you will be disappointed.

$ docker run -it postgres:13 psql -h localhost -U whale_user whale_db
psql: error: connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
connection to server at "localhost" (::1), port 5432 failed: Cannot assign requested address
        Is the server running on that host and accepting TCP/IP connections?

This is correct, as that container runs in a bridge network where localhost is the container itself. To make it work we need to run the container as part of the host network (that is the same network our computer is running on). This can be done with --network=host

$ docker run -it \
  --network=host postgres:13 \
  psql -h localhost -U whale_user whale_db
Password for user whale_user: 
psql (13.5 (Debian 13.5-1.pgdg110+1))
Type "help" for help.

whale_db=#

Please note that now psql asks for a password (that you know because you set it when we run the container whale-postgres). This happens because the tool is not run on the same node as the database server any more, so PostgreSQL doesn't trust it.

Volumes

If we used a structured framework in Python, we could leverage an ORM like SQLAlchemy to map classes to database tables. The model definitions (or changes) can be captured into little scripts called migrations that are applied to the database, and those can also be used to insert some initial data. For this example I will go a simpler route, that is to initialise the database using SQL directly.

I do not recommend this approach for a real project but it should be good enough in this case. In particular, it will allow me to demonstrate how to use volumes in Docker.

Make sure the container whale-postgres is running (with or without publishing the port, it's not important at the moment). Connect to the container using psql and run the following two SQL commands (make sure you are connected to the database whale_db)

CREATE TABLE recipes (
  recipe_id INT NOT NULL,
  recipe_name VARCHAR(30) NOT NULL,
  PRIMARY KEY (recipe_id),
  UNIQUE (recipe_name)
);

INSERT INTO recipes 
    (recipe_id, recipe_name) 
VALUES 
    (1,'Tacos'),
    (2,'Tomato Soup'),
    (3,'Grilled Cheese');

This code creates a table called recipes and inserts 3 rows with an id and a name. The output of the above commands should be

CREATE TABLE
INSERT 0 3

You can double check that the database contains the table with \dt

whale_db=# \dt
           List of relations
 Schema |  Name   | Type  |   Owner    
--------+---------+-------+------------
 public | recipes | table | whale_user
(1 row)

and that the table contains three rows with a select.

whale_db=# select * from recipes;
 recipe_id |  recipe_name   
-----------+----------------
         1 | Tacos
         2 | Tomato Soup
         3 | Grilled Cheese
(3 rows)

Now, the problem with containers is that they do not store data permanently. While the container is running there are no issues, as a matter of fact you can terminate psql, connect, and run the select again, and you will see the same data.

If we stop the container and run it again, though, we will quickly realise that the values stored in the database are gone.

$ docker stop whale-postgres | xargs docker rm 
whale-postgres

$ docker run -d \
  --name whale-postgres \
  -e POSTGRES_PASSWORD=whale_password \
  -e POSTGRES_DB=whale_db \
  -e POSTGRES_USER=whale_user \
  -p 5432:5432 postgres:13
4a647ebef78e32bb4733484a6e435780e17a69b643e872613ca50115d60d54ce

$ docker exec -it whale-postgres \
  psql -U whale_user whale_db -c "select * from recipes"
ERROR:  relation "recipes" does not exist
LINE 1: select * from recipes
                      ^

Containers have been created with isolation in mind, which is why by default nothing of what happens inside the container is connected with the host and is preserved when the container is destroyed.

As happened with ports, however, we need to establish some communication between containers and the host system, and we also want to keep data after the container has been destroyed. The solution in Docker is to use volumes.

There are three types of volumes in Docker: host, anonymous, and named. Host volumes are a way to mount inside the container a path on the host's filesystem, and while they are useful to exchange data between the host and the container, they also often have permissions issues. Generally speaking, containers define users whose IDs are not mapped to the host's ones, which means that the files written by the container might end up belonging to non-existing users.

Anonymous and named volumes are simply virtual filesystems created and managed independently from containers. These can be connected with a running container so the latter can use the data contained in them and store data that will survive its termination. The only difference between named an anonymous volumes is the name that allows you to easily manage them. For this reason, I think it's not really useful to consider anonymous volumes, which is why I will focus on named ones.

You can manage volumes using docker volume, that provides several subcommands such as create, and rm. You can then attach a named volume to a container when you run it using the option -v of docker run. This creates the volume if it's not already existing, so this is the standard way many of us create a volume.

Stop and remove the running Postgres container and run it again with a named volume

$ docker stop whale-postgres | xargs docker rm 
$ docker run -d \
  --name whale-postgres \
  -e POSTGRES_PASSWORD=whale_password \
  -e POSTGRES_DB=whale_db \
  -e POSTGRES_USER=whale_user \
  -p 5432:5432 \
  -v whale_dbdata:/var/lib/postgresql/data \
  postgres:13

This will create the volume named whale_dbdata and connect it to the path /var/lib/postgresql/data in the container that we are running. That path happens to be the one where Postgres stores the actual database, as you can see from the official documentation. There is a specific reason why I used the prefix whale_ for the name of the volume, which will be clear later when we will introduce Docker Compose.

docker ps doesn't give any information on volumes, so to see what is connected to your container you need to use docker inspect

$ docker inspect whale-postgres 
[...]
        "Mounts": [
            {
                "Type": "volume",
                "Name": "whale_dbdata",
                "Source": "/var/lib/docker/volumes/whale_dbdata/_data",
                "Destination": "/var/lib/postgresql/data",
                "Driver": "local",
                "Mode": "z",
                "RW": true,
                "Propagation": ""
            }
        ],
[...]

The value for "Source" is where the volume is stored in the host, that is on your computer, but generally speaking you can ignore that detail. You can see all volumes using docker volume ls (using grep if the list is long as it is in my case)

$ docker volume ls | grep whale
local     whale_dbdata

Now that the container is running and is connected to a volume, we can try to initialise the database again. Connect with psql using the command line we developed before and run the SQL commands that create the table recipes and insert three rows.

The whole point of using a volume is to make information permanent, so now terminate and remove the Postgres container, and run it again using the same volume. You can check that the database still contains data using the query shown previously.

$ docker rm -f whale-postgres 
whale-postgres
$ docker run -d \
  --name whale-postgres \
  -e POSTGRES_PASSWORD=whale_password \
  -e POSTGRES_DB=whale_db \
  -e POSTGRES_USER=whale_user \
  -p 5432:5432 \
  -v whale_dbdata:/var/lib/postgresql/data \
  postgres:13
893378f044204e5c1a87473a038b615a08ad08e5da9225002a470caeac8674a8
$ docker exec -it whale-postgres \
  psql -U whale_user whale_db \
  -c "select * from recipes"
 recipe_id |  recipe_name   
-----------+----------------
         1 | Tacos
         2 | Tomato Soup
         3 | Grilled Cheese
(3 rows)

Python application

Great! Now that we have a database that can be restarted without losing data we can create a Python application that interacts with it. Again, please remember that the goal of this post is to show what container orchestration is and how Docker compose can simplify it, so the application developed in this section is absolutely minimal.

I will first create an application and run it in the host, leveraging the port exposed by the container to connect to the database. Later, I will move the application in its own container.

To create the application, first create a Python virtual environment using your preferred method. I currently use pyenv (GitHub).

pyenv virtualenv whale_docker
pyenv activate whale_docker

Now we need to put our requirements in a file and install them. I prefer to keep things tidy from day zero, so create the directory whaleapp in the project directory and inside it the file requirements.txt.

mkdir whaleapp
touch whaleapp/requirements.txt

The only requirement we have for this simple application is psycopg2, so I add it to the file and then install it. Since we are installing requirements is useful to update pip as well.

echo "psycopg2" >> whaleapp/requirements.txt
pip install -U pip
pip install -r whaleapp/requirements.txt

Now create the file whaleapp/whaleapp.py and put this code in it

import time

import psycopg2

connection_data = {
    "host": "localhost",
    "database": "whale_db",
    "user": "whale_user",
    "password": "whale_password",
}

while True:
    try:
        conn = None

        # Connect to the PostgreSQL server
        print("Connecting to the PostgreSQL database...")
        conn = psycopg2.connect(**connection_data)

        # Create a cursor
        cur = conn.cursor()

        # Execute the query
        cur.execute("select * from recipes")

        # Fetch all results
        results = cur.fetchall()
        print(results)

        # Close the connection
        cur.close()
    except (Exception, psycopg2.DatabaseError) as error:
        print(error)
    finally:
        if conn is not None:
            conn.close()
            print("Database connection closed.")

    # Wait three seconds
    time.sleep(3)

As you can see the code is not complicated. The application is an endless while loop that every 3 seconds establishes a connection with the DB using the given configuration. After this, the query select * from recipes is run, all the results are printed on the standard output, and the connection is closed.

If the Postgres container is running and publishing port 5432, this application can be run directly on the host

$ python whaleapp.py 
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.

and will go on indefinitely until we press Ctrl-C to stop it.

For the same reasons of isolation and security that we discussed previously, we want to run the application in a Docker container. This can be done pretty easily, but we will run into the same issues that we had when we where trying to run psql in a separate container. At the moment, the application tries to connect to the database on localhost, which is fine while the application is running on the host directly, but won't work any more once that is transported into a Docker container.

To face one problem at a time, let's first containerise the application and run it using the host network. Once this works, we can see how to solve the communication problem between containers.

The easiest way to containerise a Python application is to create a new image starting from the image python:3. The following Dockerfile goes into the application directory (whaleapp/Dockerfile)

FROM python:3

WORKDIR /usr/src/app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD [ "python", "-u", "./whaleapp.py" ]

A Docker file contains the description of the layers that build an image. Here, we start from the official Python 3 image (DockerHub), set a working directory, copy the requirements file and install the requirements, then copy the rest of the application, and run it. The Python option -u avoids output buffering, see the documentation.

It is important to keep in mind the layered nature of Docker images, as this can lead to simple optimisation tricks. In this case, loading the requirements file and installing them creates a layer out of a file that doesn't change very often, while the layer created with COPY is probably changing very quickly while we develop the application. If we ran something like

[...]

COPY . .

RUN pip install --no-cache-dir -r requirements.txt

CMD [ "python", "-u", "./app.py" ]

we would have to install the requirements every time we change the application code, as this would rebuild the COPY layer and thus invalidate the layer containing the RUN command.

Once the Dockerfile is in place we can build the image

$ cd whaleapp
$ docker build -t whaleapp .
Sending build context to Docker daemon  6.144kB
Step 1/6 : FROM python:3
 ---> 768307cdb962
Step 2/6 : WORKDIR /usr/src/app
 ---> Using cache
 ---> b00189756ddb
Step 3/6 : COPY requirements.txt .
 ---> a7aef12f562c
Step 4/6 : RUN pip install --no-cache-dir -r requirements.txt
 ---> Running in 153a3ca6a1b2
Collecting psycopg2
  Downloading psycopg2-2.9.3.tar.gz (380 kB)
Building wheels for collected packages: psycopg2
  Building wheel for psycopg2 (setup.py): started
  Building wheel for psycopg2 (setup.py): finished with status 'done'
  Created wheel for psycopg2: filename=psycopg2-2.9.3-cp39-cp39-linux_x86_64.whl size=523502 sha256=1a3aac3cf72cc86b63a3e0f42b9b788c5237c3e5d23df649ca967b29bf89ecf5
  Stored in directory: /tmp/pip-ephem-wheel-cache-ow3d1yop/wheels/b3/a1/6e/5a0e26314b15eb96a36263b80529ce0d64382540ac7b9544a9
Successfully built psycopg2
Installing collected packages: psycopg2
Successfully installed psycopg2-2.9.3
WARNING: You are using pip version 20.2.4; however, version 21.3.1 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
Removing intermediate container 153a3ca6a1b2
 ---> b18aead1ef15
Step 5/6 : COPY . .
 ---> be7c3c11e608
Step 6/6 : CMD [ "python", "-u", "./app.py" ]
 ---> Running in 9e2f4f30b59e
Removing intermediate container 9e2f4f30b59e
 ---> b735eece4f86
Successfully built b735eece4f86
Successfully tagged whaleapp:latest

You can see the layers being built one by one (marked as Step x/6 here). Once the image has been build you should be able to see it in the list of images present in your system

$ docker image ls | grep whale
whaleapp  latest  969b15466905  9 minutes ago  894MB

You might want to observe 1 minute of silence meditating on the fact that we used almost 900 megabytes of space to run 40 lines of Python. As you can see benefits come with a cost, and you should not underestimate those. 900 megabytes might not seem a lot nowadays, but if you keep building images you will soon use up the space on your hard drive or end up paying a lot for the space on your remote repository.

By the way, this is the reason why Docker splits image into layers and reuses them. For now we can ignore this part of the game, but remember that keeping the system clean and removing past artefacts is important.

As I mentioned before we can run this image but we need to use the host network configuration.

$ docker run -it --rm --network=host --name whale-app whaleapp
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.

Please note that I used --rm to make Docker remove the container automatically when it is terminated. This way I can run it again with the same name without having to explicitly remove the past container with docker rm.

Run containers in the same network

Docker containers are isolated from the host and from other containers by default. This however doesn't mean that they can't communicate with each other if we run them in a specific configuration. In particular, an important part in Docker networking is played by bridge networks.

Whenever containers are run in the same custom bridge network, Docker provides them DNS resolution using the container names. This means that we can make the application communicate with the database without having to run the former in the host network.

A custom network can be created using docker network

$ docker network create whale

As always, Docker will return the ID of the object it just created, but we can ignore it for now, as we can refer to the network by name.

Stop and remove the Postgres container, and run it again using the network whale

$ docker rm -f whale-postgres 
whale-postgres
$ docker run -d \
  --name whale-postgres \
  -e POSTGRES_PASSWORD=whale_password \
  -e POSTGRES_DB=whale_db \
  -e POSTGRES_USER=whale_user \
  --network=whale \
  -v whale_dbdata:/var/lib/postgresql/data \
  postgres:13

Please note that there is no need to publish the port 5432 in this setup, as the host doesn't need to access the container. Should this be a requirement, add the option -p 5432:5432 again.

As happened with volumes, docker ps doesn't give information about the network that containers are using, so you have to use docker inspect again

$ docker inspect whale-postgres 
[...]
        "NetworkSettings": {
            "Networks": {
                "whale": {
[...]

As I mentioned before, Docker bridge networks provide DNS resolution using the container's name. We can double check this running a container and using ping.

$ docker run -it --rm --network=whale whaleapp ping whale-postgres
PING whale-postgres (172.19.0.2) 56(84) bytes of data.
64 bytes from whale-postgres.whale (172.19.0.2): icmp_seq=1 ttl=64 time=0.064 ms
64 bytes from whale-postgres.whale (172.19.0.2): icmp_seq=2 ttl=64 time=0.100 ms
64 bytes from whale-postgres.whale (172.19.0.2): icmp_seq=3 ttl=64 time=0.115 ms
64 bytes from whale-postgres.whale (172.19.0.2): icmp_seq=4 ttl=64 time=0.101 ms
^C
--- whale-postgres ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 80ms
rtt min/avg/max/mdev = 0.064/0.095/0.115/0.018 ms

What I did here was to run the image whaleapp that we built previously, but overriding the default command and running ping whale-postgres instead. This is a good way to check if a host can resolve a name on the network (dig is another useful tool but is not installed by default in that image).

As you can see the Postgres container is reachable and we also know that it currently runs with the IP 172.19.0.2. This value might be different on your system, but it will match the information you get if you run docker network inspect whale.

The point of all this talk about DNS is that we can now change the code of the Python application so that it connects to whale-postgres instead of localhost

connection_data = {
    "host": "whale-postgres",:@:
    "database": "whale_db",
    "user": "whale_user",
    "password": "whale_password",
}

Once this is done, rebuild the image and run it in the whale network

$ docker build -t whaleapp .
[...]
$ docker run -it --rm --network=whale --name whale-app whaleapp
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.

You can also take the network directly from another container, which is a useful shortcut.

$ docker build -t whaleapp .
[...]
$ docker run -it --rm \
  --network=container:whale-postgres \
  --name whale-app whaleapp
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.

Docker network management

The command docker network can be used to change the network configuration of running containers.

You can disconnect a running container from a network with

$ docker network disconnect NETWORK_ID CONTAINER_ID

and connect it with

$ docker network connect NETWORK_ID CONTAINER_ID

You can see which containers are using a given network inspecting it

$ docker network inspect NETWORK_ID

Remember that disconnecting a container from a network makes it unreachable, so while it is good that we can do this on running containers, maintenance shall be always carefully planned to avoid unexpected downtime.

Run time configuration

Hardcoding configuration values into the application is never a great idea, and while this is a very simple example it is worth pushing the setup a bit further to make it tidy.

In particular, we can replace the connection data host, database, and user with environment variables, which allow us to reuse the application configuring it at run time. For simplicity's sake I will store the password in an environment variable as well, and pass it in clear text when we run the container. See the box for more information about how to manage secret values.

Reading values from environment variables is easy in Python

import os
import time

import psycopg2

DB_HOST = os.environ.get("WHALEAPP__DB_HOST", None)
DB_NAME = os.environ.get("WHALEAPP__DB_NAME", None)
DB_USER = os.environ.get("WHALEAPP__DB_USER", None)
DB_PASSWORD = os.environ.get("WHALEAPP__DB_PASSWORD", None)

connection_data = {
    "host": DB_HOST,
    "database": DB_NAME,
    "user": DB_USER,
    "password": DB_PASSWORD,
}

Please note that I prefixed all environment variables with WHALEAPP__. This is not mandatory, and has no special meaning for the operating system. In my experience, complicated systems can have many environment variables, and using prefixes is a simple and effective way to keep track of which part of the system needs that particular value.

We already know how to pass environment variables to Docker containers as we did it when we run the Postgres container. Build the image again, and then run it passing the correct variables

$ docker build -t whaleapp .
[...]
$ docker run -it --rm --network=whale \
  -e WHALEAPP__DB_HOST=whale-postgres \
  -e WHALEAPP__DB_NAME=whale_db \
  -e WHALEAPP__DB_USER=whale_user \
  -e WHALEAPP__DB_PASSWORD=password \
  --name whale-app whaleapp
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.

Managing secrets

A secret is a value that should never be shown in plain text, as it is used to grant access to a system. This can be a password or a private key such as the ones you have to run SSH, and as happens with everything related to security, managing them is complicated. Please keep in mind that security is hard and that the best attitude to have is: every time you think something in security is straightforward this means you got it wrong.

Generally speaking, you want secrets to be encrypted and stored in a safe place where access is granted to a narrow set of people. These secrets should be accessible to your application in a secure way, and it shouldn't be possible to access the secrets hosted in the memory of the application.

For example, many posts online show how you can use AWS Secrets Manager to store your secrets and access them from your application using jq to fetch them at run time. While this works, if the JSON secret contains a syntax error, jq dumps the whole value in the standard output of the application, which means that the logs contain the secret in plain text.

Vault is a tool created by Hashicorp that many use to store secrets needed by containers. It is interesting to read in the description of the image that with a specific configuration the container prevents memory from being swapped to disk, which would leak the unencrypted values. As you see, security is hard.

Orchestration tools always provide a way to manage secrets and to pass them to containers. For example, see Docker Swarm secrets, Kubernetes secrets, and secrets for AWS Elastic Container Service.

Enter Docker Compose

The setup we created in the past sections is good, but is far from being optimal. We had to create a custom bridge network and then start the Postgres and the application containers connected to it. To stop the system we need to terminate containers manually and to remember to remove them to avoid blocking the container name. We also have to manually remove the network if we want to keep the system clean.

The next step would then be to create a bash script, then to evolve it to a Makefile or similar solution. Fortunately, Docker provides a better solution with Docker Compose.

Docker Compose can be described as a single-host orchestration tool. Orchestration tools are pieces of software that allow us to deal with the problems described previously, such as starting and terminating multiple containers, creating networks and volumes, managing secrets, and so on. Docker Compose works in a single-host mode, so it's a great solution for development environment, while for production multi-host environments it's better to move to more advanced tools such as AWS ECS or Kubernetes.

Docker Compose reads the configuration of a system from the file docker-compose.yml (the default value, it can be changed) that captures all we did manually in the previous sections in a compact and readable way.

To install Docker Compose follow the instructions you find here. Before we start using Docker Compose make sure you kill the Postgres container if you are still running it, and remove the network we created

$ docker rm -f whale-postgres 
whale-postgres
$ docker network remove whale
whale

Then create the file docker-compose.yml in the project directory (not the app directory) and put the following code in it

version: '3.8'

services:

This is not a valid Docker Compose file, yet, but you can see that there is a value that specifies the syntax version and one that lists services. You can find the Compose file reference here, together with a detailed description of the various versions.

The first service we want to run is Postgres, and a basic configuration for that is

version: '3.8'

services:
  db:
    image: postgres:13
    environment:
      POSTGRES_DB: whale_db
      POSTGRES_PASSWORD: whale_password
      POSTGRES_USER: whale_user
    volumes:
      - dbdata:/var/lib/postgresql/data

volumes:
  dbdata:

As you can see, this file contains the environment variables that we passed to the Postgres container and the volume configuration. The final volumes declares which volumes have to be present (so it creates them if they are not), while volumes inside the service db creates the connection just like the option -v did previously.

Now, from the project directory, you can run Docker Compose with

$ docker-compose -p whale up -d
Creating network "whale_default" with the default driver
Creating whale_db_1 ... done

The option -p sets the name of the project, which otherwise would be by default that of the directory you are at the moment (which might or might not be meaningful), while the command up -d starts all the containers in a detached mode.

As you can see from the output, Docker Compose creates a (bridge) network called whale_default. Normally, you would see a message like Creating volume "whale_dbdata" with default driver as well, but in this case the volume is already present as we created it previously. Both the network and the volume are prefixed with PROJECTNAME_, and this is the reason why when we first created the volume I named it whale_dbdata. Keep in mind however that all these default behaviours can be customised in the Compose file.

If you run docker ps you will see that the container is named whale_db_1. This comes from the project name (whale_), the service name in the Compose file (db_) and the container number, which is 1 because at the moment we are running only one container for that service.

To stop the services you have to run

$ docker-compose -p whale down
Stopping whale_db_1 ... done
Removing whale_db_1 ... done
Removing network whale_default

As you can see from the output, Docker Compose stops and removes the container, then removes the network. This is very convenient, as it already removes a lot of the work we had to do manually earlier.

We can now add the application container to the Compose file

version: '3.8'

services:
  db:
    image: postgres:13
    environment:
      POSTGRES_DB: whale_db
      POSTGRES_PASSWORD: whale_password
      POSTGRES_USER: whale_user
    volumes:
      - dbdata:/var/lib/postgresql/data
  app:
    build:
      context: whaleapp
      dockerfile: Dockerfile
    environment:
      WHALEAPP__DB_HOST: db
      WHALEAPP__DB_NAME: whale_db
      WHALEAPP__DB_USER: whale_user
      WHALEAPP__DB_PASSWORD: whale_password

volumes:
  dbdata:

This definition is slightly different, as the application container has to be built using the Dockerfile we created. Docker Compose allows us to store here the build configuration so that we don't need to pass al the options to docker build manually, but please note that configuring the build here doesn't mean that Docker Compose will build the image for you every time. You still need to run docker-compose -p whale build every time you need to rebuild it.

Please note that the variable WHALEAPP__DB_HOST is set to the service name, and not to the container name. Now, when we run Docker Compose we get

$ docker-compose -p whale up -d
Creating network "whale_default" with the default driver
Creating whale_db_1  ... done
Creating whale_app_1 ... done

and the output tells us that also the container whale_app_1 has been created this time. We can see the logs of a container with docker logs, but using docker-compose allows us to call services by name instead of by ID

$ docker-compose -p whale logs -f app
Attaching to whale_app_1
app_1  | Connecting to the PostgreSQL database...
app_1  | [(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
app_1  | Database connection closed.
app_1  | Connecting to the PostgreSQL database...
app_1  | [(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
app_1  | Database connection closed.

Health checks and dependencies

You might have noticed that at the very beginning of the application logs there are some connection errors, and that after a while the application manages to connect to the database

$ docker-compose -p whale logs -f app
Attaching to whale_app_1
app_1  | Connecting to the PostgreSQL database...
app_1  | could not translate host name "db" to address: Name or service not known
app_1  | 
app_1  | Connecting to the PostgreSQL database...
app_1  | could not translate host name "db" to address: Name or service not known
app_1  | 
app_1  | Connecting to the PostgreSQL database...
app_1  | Connecting to the PostgreSQL database...
app_1  | could not connect to server: Connection refused
app_1  |        Is the server running on host "db" (172.31.0.3) and accepting
app_1  |        TCP/IP connections on port 5432?
app_1  | 
app_1  | Connecting to the PostgreSQL database...
app_1  | [(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
app_1  | Database connection closed.
app_1  | Connecting to the PostgreSQL database...
app_1  | [(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
app_1  | Database connection closed.

These errors come from the fact that the application container is up and running before the database is ready to serve connections. In a production setup this usually doesn't happen because the database is up and running much before the application gets deployed for the first time, and then runs (hopefully) without interruption. In a development environment, instead, such a situation is normal.

Please note that this might not happen in your setup, as this is tightly connected with the speed of Docker Compose and the containers. Time-sensitive bugs are one of the worst types to deal with, and this is the reason why managing distributed systems is hard. It is important that you realise that even though this might work now on your system, the problem is there and we need to find a solution.

The standard solution when part of a system depends on another is to create a health check that periodically tests the first service, and to start the second service only when the check is successful. We can do this in the Compose file using healthcheck and depends_on

version: '3.8'

services:
  db:
    image: postgres:13
    environment:
      POSTGRES_DB: whale_db
      POSTGRES_PASSWORD: whale_password
      POSTGRES_USER: whale_user
    volumes:
      - dbdata:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready"]
      interval: 10s
      timeout: 5s
      retries: 5
  app:
    build:
      context: whaleapp
      dockerfile: Dockerfile
    environment:
      WHALEAPP__DB_HOST: db
      WHALEAPP__DB_NAME: whale_db
      WHALEAPP__DB_USER: whale_user
      WHALEAPP__DB_PASSWORD: whale_password
    depends_on:|@|
      db:|@|
        condition: service_healthy|@|

volumes:
  dbdata:

The health check for the Postgres container leverages the command line tool pg_isready that is successful only when the database is ready to accept connections, and tries every 10 seconds for 5 times. Now, when you run up -d this time you should notice a clear delay before the application is run, but the logs won't contain any connection error.

Final words

Well, this was a long one, but I hope you enjoyed the trip and you ended up having a better picture of what problems Docker Compose solve, along with a feeling of how complicated it might be to design an architecture. Everything we did was for a "simple" development environment with a couple of containers, so you can figure what is involved when we get to live environments.

Photo by Verstappen Photography on Unsplash

Clean Architectures in Python

Leonardo Giordani — Thu, 16 Sep 2021 09:51:36 +0000

The clean architecture is the opposite of spaghetti code, where everything is interlaced and there are no single elements that can be easily detached from the rest and replaced without the whole system collapsing. The main point of the clean architecture is to make clear "what is where and why", and this should be your first concern while you design and implement a software system, whatever architecture or development methodology you want to follow.

In 2018 I wrote the free book "Clean Architectures in Python" and published it on Leanpub because I wanted to help fellow developers to understand some concepts behind software architecture design, and in particular behind the Clean Architecture popularised by Robert Martin in recent years. In 2020 I published the second edition with revised code and an additional chapter. The book has been a great success and has been downloaded by more that 16000 readers so far!

After two introductory parts, chapter 1 goes through a 10,000 feet overview of a system designed with a clean architecture, while chapter 2 briefly discusses the components and the ideas behind this software architecture. Chapter 3 runs through a concrete example of clean architecture and chapter 4 expands the example adding a web application on top of it. Chapter 5 discusses error management and improvements to the Python code developed in the previous chapters. Chapters 6 and 7 show how to plug different database systems to the web service created previously, and chapter 8 wraps up the example showing how to run the application with a production-ready configuration.

You can read the book online at https://www.thedigitalcatbooks.com/ or download the PDF at https://leanpub.com/clean-architectures-in-python/. The PDF can be downloaded for free, but any contribution is welcome and will help me to create more quality content. Please also share this with friends or other people you think might be interested in the topic.

I hope you enjoy the book!

Stop using tools as if they were solutions

Leonardo Giordani — Wed, 02 Jun 2021 21:39:52 +0000

"I will write a class".

I can't tell you how many times I have heard this sentence from candidates during coding interviews.

What's wrong with this sentence? Nothing, out of context, but let me add this little detail: this is usually the first sentence I hear when the candidate tries to tackle a problem.

I know that coding interviews can be very stressful and I also think that leading such interviews requires a lot of effort to avoid transforming them into nitpicking sessions in which the candidate feels every single keystroke is scrutinised and analysed. As if the destiny of the whole company depended on how fast you can code a function that reverses a string!

But even taking into account interview anxiety, I think such an approach reveals something wrong deeper in the way we approach problems as programmers. This is the result of a culture that mistakes tools for solutions, and if I can detect it in senior programmers it means it already propagated into our teams and our companies.

The problem-solving challenge

When you face a problem (any problem) you need to devise a strategy to solve it. You need to have an idea of what to do before doing it, otherwise you are reacting and not acting.

When you practice any type of combat sport you train your body to react to specific inputs (attacks) with automatic reactions (defences, counterattacks), but you usually do it because in a real fight you don't have the time to make a conscious decision. Such "perfect" reactions, though, are the result of a constant and very focused effort to transform consciously selected actions into involuntary ones. Without training, a pure reaction is usually an average response at best.

In problem-solving we face the same challenge. Either we devise a strategy, or our approach will be clumsy and ultimately not efficient.

Imagine you were tasked to build a bridge between two sides of a river. Would your first concern be the specific type of hammers that the workers should use? After all, you can have ball-peen hammers, sledgehammers, brick hammers, and many other types. Choosing the wrong one might severely affect the performances of your workers.

That's hardly the first thing you should ask yourself. I'm pretty sure you agree that knowing the distance that the bridge should cover is much more urgent. Also, the type and the amount of traffic that it has to carry (walkers, cars, trucks, trains) is an important factor, and you should be concerned about the budget that you are allowed.

Why are these questions more important than the one about hammers? Because the answers to these questions can heavily influence the whole project. They are pillars of your architecture and not details (see "What is a software architecture?" in Clean Architectures in Python https://www.pycabook.com). My colleague Ken Pemberton always reminds me that most of the time we don't ask ourselves an even more important question: "What problem are you trying to solve?". In the example above, a bridge might not be the best solution in the first place.

I think the process, at least when it comes to software projects, can be divided into three connected phases: decomposition, communication, implementation.

Decomposition

Macroscopically, a processing system is made of an initial state, some transformations or intermediate states, and a final state.

Usually, it's simple to identify the initial and final state, while it's harder to describe what happens between the two. So, we need to proceed iteratively, describing the system using black boxes, and then opening each one of them, zooming in to describe what happens inside.

At any zoom level, from the 10,000 feet overview down to the description of a single function, you need to identify 4 things: the input, the output, the actors and the data flow.

The input is what enters the black box. It has usually been decided at a higher level of zoom or while discussing a component that provides it as output. So, it is given, and if it turns out to be inadequate we should take a step back in the design and question how we can provide proper input. The same is valid for the output.

The actors must be black boxes that accept data and transform it, and the data flow is how information is exchanged between the actors. This is clearly where it can take a long time to find a good solution, and we might need to go back and forth several times.

Let's look at an example. A search engine is a complicated piece of software, and implementing it is not a matter of 1 hour of work. But we can decompose it pretty easily, starting from the fact that the input of the system is a query, and that the output is an ordered set of results. So, my overview of this component is the following: the user inputs a query, the query is processed and the system returns a list of results, ordered by quality.

I didn't describe what "quality" is, nor discussed the specific implementation of the system that stores all possible results. Those details are buried down somewhere at a certain level of zoom and are utterly useless at this level.

Communication

Any level of zoom in the decomposition can be described, and the amount of specific technical knowledge needed to understand the explanation should be directly proportional to the zoom level. You might have heard the quote "You do not really understand something unless you can explain it to your grandmother." I believe this might be very offensive to grandmothers, but paraphrasing it, I would say that "There should be a zoom level at which the project is understandable by anyone who doesn't have a specific knowledge of the field".

Indeed, the problem of technical communication is that tech-savvy gurus are usually not able to decompose what they are working on into black boxes that are sufficiently abstract to be understandable by any human being. Please note this can happen to anyone, not only to programmers. I had to listen enough times to people working in banking, insurances, or project management (just to name a few different fields) to know that they can be unable to describe their job or specific aspects of it without using 4 obscure words every 5 words, the fifth one probably being a conjunction.

Being a blogger and an author I want to add a consideration about communication. Explaining things is the best way to see if everything is clear in your mind, which is another way to read the previous quote (without involving grandmothers). The very same post that you are reading started as an intuition, a small list of ideas, and so far has been rewritten 6 times. In the process, I understood the topics I am discussing much better than I did when I first felt the need to write them down.

Implementation

Professor Sidney Morris, in a very interesting video about how to write proofs in mathematics, describes the process with these words:

Step 1: write down what we are given
Step 2: write down the definition of each technical term in what we are given
Step 3: write down what we are required to prove
Step 4: write down the definition of each technical term in what we are required to prove

So these 4 steps are quite easy, quite straightforward.

The next step is not as easy

Step 5: THINK!

While we don't need to aim to the same level of formality required to mathematicians who prove theorems, we can surely keep the spirit of the process: write down and define what you have, write down and define what you want to achieve. Then, think.

We tend to take for granted that we can think, after all we do it all day long. But focusing our attention on a specific topic, giving it time, exploring it, considering questions about it, evaluating possible answers, all these things are increasingly unpopular. This is not the place for a critique of our society full of noise, where ideas, products, and works of art are watched for mere seconds before getting a like and passing into oblivion. But it is worth noting that thinking is not easy.

The implementation of a black box might require a lot of thinking, and we have to accept this. It might require a lot of rewrites, prove unsuccessful only after a certain amount of time, or even require a separate project to be properly managed. There are no shortcuts here.

The coding interview problem

What do we do during a coding interview? What are we trying to understand with this excruciating exercise that puts people in a pillory for one hour?

What we should do, in my opinion, is to help the candidate to show how they solve problems. We should facilitate a discussion along the lines of the three points that I mentioned: decomposition, communication, implementation. As you can see implementation is not avoided, it's a coding interview because there should be a part of it in which we write code, but it should be done only after we established a decomposition of the system.

I also believe that the assignment should be purposefully too complex to implement in a single one-hour session, and this should be explicitly communicated. This forces the candidate to design instead of rushing headlong into implementing the first requirement of the exercise without reading the rest. At any point, if the candidate is unable to implement a specific step, we can also move on to other steps and fake the input. This way we get many benefits:

The candidate won't feel stressed by the need of showing how good they are at coding. The design part is a friendly chat, where suggestions can be made and specific technologies/solutions might be discarded if not known to the candidate.
They won't perceive the interview as a failure because they couldn't implement a single step or because they didn't complete the assignment in time.
We can explore the way the candidate communicates, the way they decompose complex processes, how well they understand problems and, eventually, how they write code.
We can adjust the level of difficulty of the interview or explore specific topics in detail just asking the candidate to focus on a specific detail.

As an interviewer, I value the decomposition phase much more than the part in which you show me how well you remember all the functions of the Python standard library in a stressful situation. The truth is that I look them up very often and I don't look down on people because they don't remember the name of a method. I have one hour to decide if you are a good addition to the company, if you can be a good teammate for my next project, and if (possibly with some training) you can be given the responsibility for part of the system. In that hour I need to capture the main traits of your approach.

Don't get me wrong, I am a terrible nitpicker and probably on the brink of being OCD about some things, such as naming or tidiness of the code. But I try to take my own advice. What is the most important thing about you that I can understand? I think it would be extremely disappointing to discover that I hired someone who knows the standard library by heart but can't pick the right technology to complete a project before the deadline.

I understand that when you are interviewed you feel like you are in a position of weakness and that you are sitting there at the mercy of an evil interviewer whose purpose is only to uncover what you don't know. I'm sorry if you had to face such interviewers. I had to, and I understand the frustration. My advice is: always remember that working for a company is a matter of giving your time and your energy in exchange for personal growth. You might be interviewing for your dream job, but if the interviewer is not interested in you and your growth it's probably not that useful for you to work with them.

So, as a candidate, you have a responsibility to show the interviewer how you can solve problems. If you show how good you are at coding, you will impress only interviewers that are interested in your coding skills, and this is, in my opinion, a very limited part of what you can do as a programmer. You need to show that you can design, and this is independent of the level you are at.

You need to show that you understand problems, that you can compare solutions, that you can take your risks picking one specific strategy and that if needed you can stop at a certain point and say "This is the wrong approach".

Patterns

Design patterns are defined by Erich Gamma and his co-authors in their seminal book (Design Patterns: Elements of Reusable Object-Oriented Software by Gamma, Vlissides, Johnson, and Helm) with these words: "[...] patterns solve specific design problems and make object-oriented designs more flexible, elegant, and ultimately reusable. [...] A designer who is familiar with such patterns can apply them immediately to design problems without having to rediscover them."

I want to focus on the words "solve specific design problems" because what I notice is that many people apply patterns without having understood the problem they are trying to solve. Even worse, they look at the world through the lens of the patterns they know, twisting the nature of problems to fit the solution they know.

Back to the original sentence. "I will write a class" is considered the go-to solution in OOP languages. What we believe is that, in an OOP language, whatever the problem, the solution is to write a class. So, our first move on the chessboard of the interview is to write a class. This is a dangerous misuse of a pattern such as data encapsulation, and an expert interviewer will checkmate us in one move. I saw candidates facing problems that could be solved in 10 minutes with two functions and a dictionary spending more than 50 minutes swamped in a multitude of classes, trying to figure out which object contained the data they needed at a certain point of the process.

Clearly, classes might be the best solution for some problems, but this should come at the end of your analysis. You write a class because you have data and functions that can be put together, and this is valid for any other technology. Always ask yourself: what is the reason why I use this? What is the problem that I'm trying to solve?

A dangerous culture

We all make the same mistake here: we push (or at least accept) a culture in which we teach and learn tools as go-to solutions without teaching to identify and face problems.

Programming languages, architectural patterns, algorithms. Those are all tools to implement solutions, they are not the solutions. You should learn them, down to the most minute details if you can, but never put them on the table before you understood the problem.

Alexis Carrel said, "A few observation and much reasoning lead to error; many observations and a little reasoning to truth." (Réflexions sur la vie, Paris, 1952) The advice that I take from the French Nobel Prize winner is: what is in front of you has to be observed deeply to find out its real nature. What things are is much more important than what we think they are and how we think we should treat them ("reasoning"). And what things are, if observed properly, will also reveal ways to interact with them, to manipulate them, to solve them.

If you want a clear example of the opposite, observe a programmer (maybe you yourself) looking for help on an error the web framework or the compiler threw at them. Copy and paste the error message into Google, pick the first result (Stack Overflow), scroll down until you find some code, apply. I dare you to call this "engineering". Many times we don't even read the Stack Overflow question, we directly read the answer, not to mention the fact that many times we don't even read the error message!

I recommend reading a very interesting article by Joseph Gefroh, Why Your Technical Interview Is Broken, and How to Fix It, where he discusses the various types of skills that you can explore during an interview, and which ones you should be interested in. In particular, I couldn't agree more with his point about algorithmic interviews, as I believe they are deeply flawed.

I also recommend having a look at the Guardian Coding Exercises and to read the description of the repository. I think they are a good example of tests that allow the candidate and the interviewer to work together, to actually meet and to discuss a solution. There is no "right" way to solve them, and many of them cannot be solved in 45 minutes, which is usually the time given to a candidate after an initial introductory chat.

Conclusion

I hope these short considerations helped you to see my point. We should all shift our gaze from the tools we have to the nature of problems and to their solutions. We are missing an important step here, which is ultimately what defines a good engineer and which is the most important thing that you can learn in your career. Observe problems, stop and think, devise a strategy, zoom out and zoom in. Learn to use tools, don't be used by them.

We need to push for this approach in our interviews, but also try to promote this culture in our teams and companies.

Photo by Philip Swinburn on Unsplash

Public key cryptography: RSA keys

Leonardo Giordani — Sun, 08 Nov 2020 21:44:55 +0000

Photo by Chris Barbalis on Unsplash

This article was originally published on The Digital Cat

I bet you created at least once an RSA key pair, usually because you needed to connect to GitHub and you wanted to avoid typing your password every time. You diligently followed the documentation on how to create SSH keys and after a couple of minutes your setup was complete.

But do you know what you actually did?

Do you know what the file ~/.ssh/id_rsa really contains? Why did ssh create two files with such a different format? Did you notice that one file begins with ssh-rsa, while the other begins with -----BEGIN RSA PRIVATE KEY-----? Have you noticed that sometimes the header of the second file misses the RSA part and just says BEGIN PRIVATE KEY?

I believe that a minimum level of knowledge regarding the various formats of RSA keys is mandatory for every developer nowadays, not to mention the importance of understanding them deeply if you want to pursue a career in the infrastructure management world.

RSA algorithm and key pairs

Since the invention of public-key cryptography, various systems have been devised to create the key pair. One of the first ones is RSA, the creation of three brilliant cryptographers, that dates back to 1977. The story of RSA is quite interesting, as it was first invented by an English mathematician, Clifford Cocks, who was however forced to keep it secret by the British intelligence office he was working for.

Keeping in mind that RSA is not a synonym for public-key cryptography but only one of the possible implementations, I wanted to write a post on it because it is still, more than 40 years after its publication, one of the most widespread algorithms. In particular it is the standard algorithm used to generate SSH key pairs, and since nowadays every developer has their public key on GitHub, BitBucket, or similar systems, we may arguably say that RSA is pretty ubiquitous.

I will not cover the internals of the RSA algorithm in this article, however. If you are interested in the gory details of the mathematical framework you may find plenty of resources both on Internet and in the textbooks. The theory behind it is not trivial, but it is definitely worth the time if you want to be serious about the mathematical part of cryptography.

In this article I will instead explore two ways to create RSA key pairs and the formats used to store them. Applied cryptography is, like many other topics in computer science, a moving target, and the tools change often. Sometimes it is pretty easy to find out how to do something (StackOverflow helps), but less easy to get a clear picture of what is going on.

All the examples shown in this post use a 2048-bits RSA key created for this purpose, so all the numbers you see come from a real example. The key has been obviously trashed after I wrote the article.

The PEM format

Let's start the discussion about key pairs with the format used to store them. Nowadays the most widely accepted storage format is called PEM (Privacy-enhanced Electronic Mail). As the name suggests, this format was initially created for e-mail encryption but later became a general format to store cryptographic data like keys and certificates. It is described in RFC 7468 ("Textual Encodings of PKIX, PKCS, and CMS Structures").

An example private key in PEM format is the following

Basically, you can tell you are dealing with a PEM format from the typical header and footer that identify the content. While the hyphens and the two words BEGIN and END are always present, the PRIVATE KEY part describes the content and can change if the PEM file contains something different from a key, for example an X.509 certificate for SSL.

The PEM format specifies that the the body of the content (the part between the header and the footer) is encoded using Base64.

If the private key has been encrypted with a password the header and the footer are different

When the PEM format is used to store cryptographic keys the body of the content is in a format called PKCS #8. Initially a standard created by a private company (RSA Laboratories), it became a de facto standard so has been described in various RFCs, most notably RFC 5208 ("Public-Key Cryptography Standards (PKCS) #8: Private-Key Information Syntax Specification Version 1.2").

The PKCS #8 format describes the content using the ASN.1 (Abstract Syntax Notation One) description language and the relative DER (Distinguished Encoding Rules) to serialize the resulting structure. This means that Base64-decoding the content will return some binary content that can be processed only by an ASN.1 parser.

Let me visually recap the structure

-----BEGIN label-----
+--------------------------- Base64 ---------------------------+
|                                                              |
| PKCS #8 content:                                             |
| ASN.1 language serialized with DER                           |
|                                                              |
+--------------------------------------------------------------+
-----END label-----

Please note that, due to the structure of the underlying ASN.1 structure, every PEM body starts with the characters MII.

OpenSSL and ASN.1

OpenSSL can directly decode a key in PEM format and show the underlying ASN.1 structure with the module asn1parse

$ openssl asn1parse -inform pem -in private.pem
    0:d=0  hl=4 l=1214 cons: SEQUENCE          
    4:d=1  hl=2 l=   1 prim: INTEGER           :00
    7:d=1  hl=2 l=  13 cons: SEQUENCE          
    9:d=2  hl=2 l=   9 prim: OBJECT            :rsaEncryption
   20:d=2  hl=2 l=   0 prim: NULL              
   22:d=1  hl=4 l=1192 prim: OCTET STRING      [HEX DUMP]:308204A40201000282010100B2F5FD3F9F0917112
   CE42F8BF87ED676E15258BE443F36DEAFB0B69BDE2496B495EAAD1B01CAD84271B014E96F79386C636D348516DA74A68
   A8C70FBA882870C47B4218D8F49186DDF72727B9D80C21911C3E337C6E407FFB47C2F2767B0D164D8A1E9AF95F6481BF
   8D9EDFB2E3904B2529268C460256FAFD0A677D29898F10B1D15128A695839FC08EDD584E8335615B1D1D7277BE65C532
   DCA92DDC7050374868B117EA9154914EF9292B8443F13696E4FAD50DED6BD90E5A6F7ED33BE2ECE31C6DD7A4253EE6CD
   C56787DDD1D5CD776614022DB87D03BB22F23285B5A3167AF8DACABBEA40004471337D3781E8C5CCA0EA5E27799B510E
   4EF938C61CAA60D02030100010282010100B24255000A6A03901827333539511E4F4C21BA43CBB72BF0A51060D4E1719
   0AC50A871C57503986696D7CDFCB80D0726EFE2D76DBA55DFDC0425E064CC753810035C6A0F97AA37AB39E7C6215BC1E
   595131D0C3782E5A11213B59F42A1067F8CF43C538992D6BEFD1DE3F6293CE18ECC1173C4E7D6DD7362AD7323E7A218B
   5FFB0F245EB796327CC87493EDD134234ED5F3B14A4C4D92374597F64A6D3CB2C10F0CD2D57E99F58C8D28F2049D1433
   CC4BD677017AD1BDD1C83CFB8FB7E8C8FDCF0B4FB77DE7B8285749CEDFBFD6878F7F7930073F0F42ADDCBA8385D7ED05
   CDFCAA2A2BA757601723A96201FECCC2E65C65E14F65F1D34D6ECDFE3F85401800102818100E1D16389BF6EFF7AE44F6
   57106ED81C81A48B5FB356F83DD4A229E8654BDC036716BBD9D46DFD1498132545054958ACA5CFDA709D97CC8C6A9E92
   03D05F7B9D45E685A19A5F58267FCB17FCF502B32CFEDB94CAEA58EE5F63EBA5F33D09946C8652132344410D3D658748
   BCAE256F24896C2A9AD9340D3C8392652DA8ED7346D02818100CAE155C9B3A4546B5FC3CF4CC80D539D531C406BAC5ED
   82818E977B496F9F614CEFB1179E3BFBFAB22BCA7F88EBB8C9B1327AE70113242DFF0866370B6C76782DBD50DBE1FEE9
   B3316B9AAC7BABB7CFA0A9EF26C3C976CF62DA8F41EFE065458DC7C1CBCA78FB1CB4FF7AA50D116CE1640956A4E89EAD
   F5293FAA13A2349F42102818100BC3B93324E6D92EE7883AA366624F28ABF461ED3B0BE2CF7F805158939F815D20C075
   83E52C6DCA8DDD5FB2C1EE5AC9474A1476CD16ACFDDB1E24EEA2F204939BA1C58068B2D342FC4169D484D36451BC7B82
   F306176D53FC71809A5A25B320277320DAC3D949D504DD9907164EC3EF7BD1BB4DEA82160A7C4E3AA2ADEE88A9D02818
   02915E921A7D7A7A0F70BD8775C2C16BACD91F319DB1679FFE4CBA30A5768D784EF45B90C4E2B0ECDC18323211B06B03
   AD76E39CD482E3D8CCC50EAE270A1813CE6F80688723F07FF18A3110AD1AE16692CAD73BAA7AAA2CE5800D72F4F92489
   296542C1DA87159382B41A4A42933CD18848BBDB39A0A8E9F5288770E27075B010281803AB4E3B841AB234515BF0A8D2
   E40FB6E95389702D834474E9AD849124DC6C1D342738D4E7510265DF6B744EBAA4A88A7995346BEEF047DB024CE8B2A4
   E3923B0566389948AB0BBB031879770DA14F4418AEB75AE98349122A2D9535117B05BEF938A1211A3BE6E882957BC2A5
   F1DE5CA50C26F42EE0A383A2A2B6340D52E1A36

This that you see in the code snippet is then the private key in ASN.1 format. Remember that DER is only used to go from the text representation of ASN.1 to binary data, so we don't see it unless we decode the Base64 content into a file and open it with a binary editor.

Note that the ASN.1 structure contains the type of the object (rsaEncryption, in this case). You can further decode the OCTET STRING field, which is the actual key, specifying the offset

$ openssl asn1parse -inform pem -in private.pem -strparse 22
    0:d=0  hl=4 l=1188 cons: SEQUENCE          
    4:d=1  hl=2 l=   1 prim: INTEGER           :00
    7:d=1  hl=4 l= 257 prim: INTEGER           :B2F5FD3F9F0917112CE42F8BF87ED676E15258BE443F36DEAFB
    0B69BDE2496B495EAAD1B01CAD84271B014E96F79386C636D348516DA74A68A8C70FBA882870C47B4218D8F49186DDF
    72727B9D80C21911C3E337C6E407FFB47C2F2767B0D164D8A1E9AF95F6481BF8D9EDFB2E3904B2529268C460256FAFD
    0A677D29898F10B1D15128A695839FC08EDD584E8335615B1D1D7277BE65C532DCA92DDC7050374868B117EA9154914
    EF9292B8443F13696E4FAD50DED6BD90E5A6F7ED33BE2ECE31C6DD7A4253EE6CDC56787DDD1D5CD776614022DB87D03
    BB22F23285B5A3167AF8DACABBEA40004471337D3781E8C5CCA0EA5E27799B510E4EF938C61CAA60D
  268:d=1  hl=2 l=   3 prim: INTEGER           :010001
  273:d=1  hl=4 l= 257 prim: INTEGER           :B24255000A6A03901827333539511E4F4C21BA43CBB72BF0A51
    060D4E17190AC50A871C57503986696D7CDFCB80D0726EFE2D76DBA55DFDC0425E064CC753810035C6A0F97AA37AB39
    E7C6215BC1E595131D0C3782E5A11213B59F42A1067F8CF43C538992D6BEFD1DE3F6293CE18ECC1173C4E7D6DD7362A
    D7323E7A218B5FFB0F245EB796327CC87493EDD134234ED5F3B14A4C4D92374597F64A6D3CB2C10F0CD2D57E99F58C8
    D28F2049D1433CC4BD677017AD1BDD1C83CFB8FB7E8C8FDCF0B4FB77DE7B8285749CEDFBFD6878F7F7930073F0F42AD
    DCBA8385D7ED05CDFCAA2A2BA757601723A96201FECCC2E65C65E14F65F1D34D6ECDFE3F854018001
  534:d=1  hl=3 l= 129 prim: INTEGER           :E1D16389BF6EFF7AE44F657106ED81C81A48B5FB356F83DD4A2
    29E8654BDC036716BBD9D46DFD1498132545054958ACA5CFDA709D97CC8C6A9E9203D05F7B9D45E685A19A5F58267FC
    B17FCF502B32CFEDB94CAEA58EE5F63EBA5F33D09946C8652132344410D3D658748BCAE256F24896C2A9AD9340D3C83
    92652DA8ED7346D
  666:d=1  hl=3 l= 129 prim: INTEGER           :CAE155C9B3A4546B5FC3CF4CC80D539D531C406BAC5ED82818E
    977B496F9F614CEFB1179E3BFBFAB22BCA7F88EBB8C9B1327AE70113242DFF0866370B6C76782DBD50DBE1FEE9B3316
    B9AAC7BABB7CFA0A9EF26C3C976CF62DA8F41EFE065458DC7C1CBCA78FB1CB4FF7AA50D116CE1640956A4E89EADF529
    3FAA13A2349F421
  798:d=1  hl=3 l= 129 prim: INTEGER           :BC3B93324E6D92EE7883AA366624F28ABF461ED3B0BE2CF7F80
    5158939F815D20C07583E52C6DCA8DDD5FB2C1EE5AC9474A1476CD16ACFDDB1E24EEA2F204939BA1C58068B2D342FC4
    169D484D36451BC7B82F306176D53FC71809A5A25B320277320DAC3D949D504DD9907164EC3EF7BD1BB4DEA82160A7C
    4E3AA2ADEE88A9D
  930:d=1  hl=3 l= 128 prim: INTEGER           :2915E921A7D7A7A0F70BD8775C2C16BACD91F319DB1679FFE4C
    BA30A5768D784EF45B90C4E2B0ECDC18323211B06B03AD76E39CD482E3D8CCC50EAE270A1813CE6F80688723F07FF18
    A3110AD1AE16692CAD73BAA7AAA2CE5800D72F4F92489296542C1DA87159382B41A4A42933CD18848BBDB39A0A8E9F5
    288770E27075B01
1061:d=1  hl=3 l= 128 prim: INTEGER           :3AB4E3B841AB234515BF0A8D2E40FB6E95389702D834474E9AD8
    49124DC6C1D342738D4E7510265DF6B744EBAA4A88A7995346BEEF047DB024CE8B2A4E3923B0566389948AB0BBB0318
    79770DA14F4418AEB75AE98349122A2D9535117B05BEF938A1211A3BE6E882957BC2A5F1DE5CA50C26F42EE0A383A2A
    2B6340D52E1A36

Being this an RSA key the fields represent specific components of the algorithm. We find in order the modulus n = pq, the public exponent e, the private exponent d, the two prime numbers p and q, and the values d_p, d_q, and q_inv (for the Chinese remainder theorem speed-up).

If the key has been encrypted there are fields with information about the cipher, and the OCTET STRING fields cannot be further parsed because of the encryption.

$ openssl asn1parse -inform pem -in private-enc.pem
    0:d=0  hl=4 l=1311 cons: SEQUENCE          
    4:d=1  hl=2 l=  73 cons: SEQUENCE          
    6:d=2  hl=2 l=   9 prim: OBJECT            :PBES2
   17:d=2  hl=2 l=  60 cons: SEQUENCE          
   19:d=3  hl=2 l=  27 cons: SEQUENCE          
   21:d=4  hl=2 l=   9 prim: OBJECT            :PBKDF2
   32:d=4  hl=2 l=  14 cons: SEQUENCE          
   34:d=5  hl=2 l=   8 prim: OCTET STRING      [HEX DUMP]:7FBE6B5C86A4B922
   44:d=5  hl=2 l=   2 prim: INTEGER           :0800
   48:d=3  hl=2 l=  29 cons: SEQUENCE          
   50:d=4  hl=2 l=   9 prim: OBJECT            :aes-256-cbc
   61:d=4  hl=2 l=  16 prim: OCTET STRING      [HEX DUMP]:7FC1CC749F456498F01E43108E4340DE
   79:d=1  hl=4 l=1232 prim: OCTET STRING      [HEX DUMP]:A5581EDC2797FC4E1AD0B66A00B765900AF1164D8
   F67458C1A4E72F54A65F2B8C0C5AD7E42584B95161FD98FBECA07D8E1049687C365ED157C45F1B57B175D2EF778A1FE7
   D12E50C0DF4248F0E1469DA40F9948581F16546F9582D9DCA83AC07C9466A6E3E6CE98CC241C44DAB32F5891B96DE302
   4B6E6A0F4980C6286D6EB8AA1680AD132810EEFB127DE42968142F4F9A4A2CE55A560C054C54DFFBB720A81F3F50A2A6
   3D748CE06309F55340BD4C74980C48F4C9D41650568A62BBE8E0337653BD4A2F7D47C3A24514B5D3100ED40C164831C6
   5A96DC90AD20F4AEF02E00203B0F0B2D550987AEE8F4C7E0E7C0CFF426B465D3CF568D02EE86AF043345954B0AAA649F
   A9F80E026E2A189EC60772A058615DCFEC9EC4D2D12CDEB7844EAA00202E435A0B9B0A28AC4F2DA213214F773A2319D5
   5A560D5C99246F9895F5EF04D97FF1CE26EFC2FF82249F6E94253CB92EE0A74AE3942285C2DFFC77883709E7FF2569FD
   9C8F58C112CD4A125E40E7BC8599242D71DE7D48416B6A36FBE0B90BA9A05AFB982CAE9AD337C2318582AA328ABC341F
   BB1C036DE334DE327DEC97BA757CBBAED26F25DD74BD8BE9215B479CD49D8357AFA5289A0265ADE025F9FC0CDB1CDBF0
   4C812F20B7CEB58BF12C1FD1756AABD7F557B87E1D245E8062D1DF4078D77AD98BFDF0C0F3A06A7FA11BFAE0EBF8F3EE
   1F8AB0D6D7C905D4D238E2738613EA753E044589CEBDF3714CACEC298653FA45AF5977BDCF23B5DD60B479C7958B8AC1
   8CAA4AA4A79C283805246675BBB8D2D0E5B714320E7E6FE8B2EF73DB9839095229B9653726AB9689B19AB47113F70204
   83B2D1A82FE2EB9ABAB429DDF5ACDEBCAB62BABD48D2DBA1D398B03F9919F1DAC8CDA19D39BBAF2B5FE96C43E78F565C
   465019DF88E71BCE35C6F7F8BE87EB384FA1193345E47CA9382BCEFFC2E6B37681E8D95EB48BC7044F7DCA743217D4C0
   81200502E98EC2CFAA9D17277D5385E65CC8104DA999E31532A8B9B3B4D3E219613AE09BC9F10553CC4E5F135ACD3FB4
   A3BBAB21839CEFBBC0D4BB16AE4FBD7407E6E3709B059BD86AFFE032805CE5FB0B8005009B5964B79E478DA7FE88C20D
   D2FEDA10A0EB3433ADC90AF5DD8772B840A5CD7C5E32D96153E41F12BA501EF1F48C4E20CB0120CFBB6F546C2B6E22E0
   834CB9DFBFA4834FEB4B7374788F781A1634ABF9D1FD014E6DB3749E6A086155521ADB9F271D6BF6F60455903B1D913D
   A639EE9F5CA5135FD2A1873FF35EAB8C151C5B90826E4303233D4BB053EBD929107874CDCCADFFF492A7CB595EADF03E
   4C0FE15326752898F1B9AA3EAC9907D9F276E6AB37AFA34FF8F3DBAB7B009754CF1A13029CD6857686105830F0CF6E99
   476CB07ECAAEA8B5CCC2720479423F8504E783D6712E424C636DAB41203D9EC76F47C4B56F453C42E5626048C24CC585
   F0710514EEF6D4C9644E0721CEAE9F885FBD672742A555095A895C7F0D4E814BEF4D223B13285E95BEDF7357D3545784
   32C1EBB63A6EF1D83E21A08DADA073BF9419C7A3185BB492A13569F262683E7CD86EC66CF671C919789038598EFEC22B
   C8EA1E265A4E0864F9E7253BE32457AC1B186722F3D0FF4AD450D04BA97D5B7DC1AA617DBD25EE8EC912072ABCBF5394
   D08AA276732666D4C349196940BFE869DA909EC03A8E25B23339EE50453CB5F81400B1380CA46AF0FC012CA55F322C1C
   5806E5D76D4CD8308B8FDFE

OpenSSL and RSA keys

Another way to look into a private key with OpenSSL is to use the module rsa. While the module asn1parse is a generic ASN.1 parser, the module rsa knows the structure of an RSA key and can properly output the field names

$ openssl rsa -in private.pem -noout -text
Private-Key: (2048 bit)
modulus:
    00:b2:f5:fd:3f:9f:09:17:11:2c:e4:2f:8b:f8:7e:
    d6:76:e1:52:58:be:44:3f:36:de:af:b0:b6:9b:de:
    24:96:b4:95:ea:ad:1b:01:ca:d8:42:71:b0:14:e9:
    6f:79:38:6c:63:6d:34:85:16:da:74:a6:8a:8c:70:
    fb:a8:82:87:0c:47:b4:21:8d:8f:49:18:6d:df:72:
    72:7b:9d:80:c2:19:11:c3:e3:37:c6:e4:07:ff:b4:
    7c:2f:27:67:b0:d1:64:d8:a1:e9:af:95:f6:48:1b:
    f8:d9:ed:fb:2e:39:04:b2:52:92:68:c4:60:25:6f:
    af:d0:a6:77:d2:98:98:f1:0b:1d:15:12:8a:69:58:
    39:fc:08:ed:d5:84:e8:33:56:15:b1:d1:d7:27:7b:
    e6:5c:53:2d:ca:92:dd:c7:05:03:74:86:8b:11:7e:
    a9:15:49:14:ef:92:92:b8:44:3f:13:69:6e:4f:ad:
    50:de:d6:bd:90:e5:a6:f7:ed:33:be:2e:ce:31:c6:
    dd:7a:42:53:ee:6c:dc:56:78:7d:dd:1d:5c:d7:76:
    61:40:22:db:87:d0:3b:b2:2f:23:28:5b:5a:31:67:
    af:8d:ac:ab:be:a4:00:04:47:13:37:d3:78:1e:8c:
    5c:ca:0e:a5:e2:77:99:b5:10:e4:ef:93:8c:61:ca:
    a6:0d
publicExponent: 65537 (0x10001)
privateExponent:
    00:b2:42:55:00:0a:6a:03:90:18:27:33:35:39:51:
    1e:4f:4c:21:ba:43:cb:b7:2b:f0:a5:10:60:d4:e1:
    71:90:ac:50:a8:71:c5:75:03:98:66:96:d7:cd:fc:
    b8:0d:07:26:ef:e2:d7:6d:ba:55:df:dc:04:25:e0:
    64:cc:75:38:10:03:5c:6a:0f:97:aa:37:ab:39:e7:
    c6:21:5b:c1:e5:95:13:1d:0c:37:82:e5:a1:12:13:
    b5:9f:42:a1:06:7f:8c:f4:3c:53:89:92:d6:be:fd:
    1d:e3:f6:29:3c:e1:8e:cc:11:73:c4:e7:d6:dd:73:
    62:ad:73:23:e7:a2:18:b5:ff:b0:f2:45:eb:79:63:
    27:cc:87:49:3e:dd:13:42:34:ed:5f:3b:14:a4:c4:
    d9:23:74:59:7f:64:a6:d3:cb:2c:10:f0:cd:2d:57:
    e9:9f:58:c8:d2:8f:20:49:d1:43:3c:c4:bd:67:70:
    17:ad:1b:dd:1c:83:cf:b8:fb:7e:8c:8f:dc:f0:b4:
    fb:77:de:7b:82:85:74:9c:ed:fb:fd:68:78:f7:f7:
    93:00:73:f0:f4:2a:dd:cb:a8:38:5d:7e:d0:5c:df:
    ca:a2:a2:ba:75:76:01:72:3a:96:20:1f:ec:cc:2e:
    65:c6:5e:14:f6:5f:1d:34:d6:ec:df:e3:f8:54:01:
    80:01
prime1:
    00:e1:d1:63:89:bf:6e:ff:7a:e4:4f:65:71:06:ed:
    81:c8:1a:48:b5:fb:35:6f:83:dd:4a:22:9e:86:54:
    bd:c0:36:71:6b:bd:9d:46:df:d1:49:81:32:54:50:
    54:95:8a:ca:5c:fd:a7:09:d9:7c:c8:c6:a9:e9:20:
    3d:05:f7:b9:d4:5e:68:5a:19:a5:f5:82:67:fc:b1:
    7f:cf:50:2b:32:cf:ed:b9:4c:ae:a5:8e:e5:f6:3e:
    ba:5f:33:d0:99:46:c8:65:21:32:34:44:10:d3:d6:
    58:74:8b:ca:e2:56:f2:48:96:c2:a9:ad:93:40:d3:
    c8:39:26:52:da:8e:d7:34:6d
prime2:
    00:ca:e1:55:c9:b3:a4:54:6b:5f:c3:cf:4c:c8:0d:
    53:9d:53:1c:40:6b:ac:5e:d8:28:18:e9:77:b4:96:
    f9:f6:14:ce:fb:11:79:e3:bf:bf:ab:22:bc:a7:f8:
    8e:bb:8c:9b:13:27:ae:70:11:32:42:df:f0:86:63:
    70:b6:c7:67:82:db:d5:0d:be:1f:ee:9b:33:16:b9:
    aa:c7:ba:bb:7c:fa:0a:9e:f2:6c:3c:97:6c:f6:2d:
    a8:f4:1e:fe:06:54:58:dc:7c:1c:bc:a7:8f:b1:cb:
    4f:f7:aa:50:d1:16:ce:16:40:95:6a:4e:89:ea:df:
    52:93:fa:a1:3a:23:49:f4:21
exponent1:
    00:bc:3b:93:32:4e:6d:92:ee:78:83:aa:36:66:24:
    f2:8a:bf:46:1e:d3:b0:be:2c:f7:f8:05:15:89:39:
    f8:15:d2:0c:07:58:3e:52:c6:dc:a8:dd:d5:fb:2c:
    1e:e5:ac:94:74:a1:47:6c:d1:6a:cf:dd:b1:e2:4e:
    ea:2f:20:49:39:ba:1c:58:06:8b:2d:34:2f:c4:16:
    9d:48:4d:36:45:1b:c7:b8:2f:30:61:76:d5:3f:c7:
    18:09:a5:a2:5b:32:02:77:32:0d:ac:3d:94:9d:50:
    4d:d9:90:71:64:ec:3e:f7:bd:1b:b4:de:a8:21:60:
    a7:c4:e3:aa:2a:de:e8:8a:9d
exponent2:
    29:15:e9:21:a7:d7:a7:a0:f7:0b:d8:77:5c:2c:16:
    ba:cd:91:f3:19:db:16:79:ff:e4:cb:a3:0a:57:68:
    d7:84:ef:45:b9:0c:4e:2b:0e:cd:c1:83:23:21:1b:
    06:b0:3a:d7:6e:39:cd:48:2e:3d:8c:cc:50:ea:e2:
    70:a1:81:3c:e6:f8:06:88:72:3f:07:ff:18:a3:11:
    0a:d1:ae:16:69:2c:ad:73:ba:a7:aa:a2:ce:58:00:
    d7:2f:4f:92:48:92:96:54:2c:1d:a8:71:59:38:2b:
    41:a4:a4:29:33:cd:18:84:8b:bd:b3:9a:0a:8e:9f:
    52:88:77:0e:27:07:5b:01
coefficient:
    3a:b4:e3:b8:41:ab:23:45:15:bf:0a:8d:2e:40:fb:
    6e:95:38:97:02:d8:34:47:4e:9a:d8:49:12:4d:c6:
    c1:d3:42:73:8d:4e:75:10:26:5d:f6:b7:44:eb:aa:
    4a:88:a7:99:53:46:be:ef:04:7d:b0:24:ce:8b:2a:
    4e:39:23:b0:56:63:89:94:8a:b0:bb:b0:31:87:97:
    70:da:14:f4:41:8a:eb:75:ae:98:34:91:22:a2:d9:
    53:51:17:b0:5b:ef:93:8a:12:11:a3:be:6e:88:29:
    57:bc:2a:5f:1d:e5:ca:50:c2:6f:42:ee:0a:38:3a:
    2a:2b:63:40:d5:2e:1a:36

The fields are the same we found in the ASN.1 structure, but in this representation we have a better view of the specific values of the RSA key. You can compare the two and see that the value of the fields are the same.

If you want to learn something about RSA try to investigate the historical reasons behind the choice of 65537 as a common public exponent (as you can see here in the section publicExponent).

PKCS #8 vs PKCS #1

The first version of the PKCS standard (PKCS #1) was specifically tailored to contain an RSA key. Its ASN.1 definition can be found in RFC 8017 ("PKCS #1: RSA Cryptography Specifications Version 2.2")

RSAPublicKey ::= SEQUENCE {
    modulus           INTEGER,  -- n
    publicExponent    INTEGER   -- e
}

RSAPrivateKey ::= SEQUENCE {
    version           Version,
    modulus           INTEGER,  -- n
    publicExponent    INTEGER,  -- e
    privateExponent   INTEGER,  -- d
    prime1            INTEGER,  -- p
    prime2            INTEGER,  -- q
    exponent1         INTEGER,  -- d mod (p-1)
    exponent2         INTEGER,  -- d mod (q-1)
    coefficient       INTEGER,  -- (inverse of q) mod p
    otherPrimeInfos   OtherPrimeInfos OPTIONAL
}

Subsequently, as the need to describe new types of algorithms increased, the PKCS #8 standard was developed. This can contain different types of keys, and defines a specific field for the algorithm identifier. Its ASN.1 definition can be found in RFC 5958 ("Asymmetric Key Packages")

OneAsymmetricKey ::= SEQUENCE {
     version                   Version,
     privateKeyAlgorithm       PrivateKeyAlgorithmIdentifier,
     privateKey                PrivateKey,
     attributes            [0] Attributes OPTIONAL,
     ...,
     [[2: publicKey        [1] PublicKey OPTIONAL ]],
     ...
   }

PrivateKey ::= OCTET STRING
                     -- Content varies based on type of key. The
                     -- algorithm identifier dictates the format of
                     -- the key.

The definition of the field PrivateKey for the RSA algorithm is the same used in PKCS #1.

If the PEM format uses PKCS #8 its header and footer are

-----BEGIN PRIVATE KEY-----
[...]
-----END PRIVATE KEY-----

If it uses PKCS #1, however, there has to be an external identification of the algorithm, so the header and footer are

-----BEGIN RSA PRIVATE KEY-----
[...]
-----END RSA PRIVATE KEY-----

The structure of PKCS #8 is the reason why we had to parse the field at offset 22 to access the RSA parameters when using the module asn1parse of OpenSSL. If you are parsing a PKCS #1 key in PEM format you don't need this second step.

Private and public key

In the RSA algorithm the public key is built using the modulus and the public exponent, which means that we can always derive the public key from the private key. OpenSSL can easily do this with the module rsa, producing the public key in PEM format

$ openssl rsa -in private.pem -pubout
writing RSA key
-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAsvX9P58JFxEs5C+L+H7W
duFSWL5EPzber7C2m94klrSV6q0bAcrYQnGwFOlveThsY200hRbadKaKjHD7qIKH
DEe0IY2PSRht33Jye52AwhkRw+M3xuQH/7R8LydnsNFk2KHpr5X2SBv42e37LjkE
slKSaMRgJW+v0KZ30piY8QsdFRKKaVg5/Ajt1YToM1YVsdHXJ3vmXFMtypLdxwUD
dIaLEX6pFUkU75KSuEQ/E2luT61Q3ta9kOWm9+0zvi7OMcbdekJT7mzcVnh93R1c
13ZhQCLbh9A7si8jKFtaMWevjayrvqQABEcTN9N4Hoxcyg6l4neZtRDk75OMYcqm
DQIDAQAB
-----END PUBLIC KEY-----

You can dump the information in the public key specifying the flag -pubin

$ openssl rsa -in public.pem -noout -text -pubin
Public-Key: (2048 bit)
Modulus:
    00:b2:f5:fd:3f:9f:09:17:11:2c:e4:2f:8b:f8:7e:
    d6:76:e1:52:58:be:44:3f:36:de:af:b0:b6:9b:de:
    24:96:b4:95:ea:ad:1b:01:ca:d8:42:71:b0:14:e9:
    6f:79:38:6c:63:6d:34:85:16:da:74:a6:8a:8c:70:
    fb:a8:82:87:0c:47:b4:21:8d:8f:49:18:6d:df:72:
    72:7b:9d:80:c2:19:11:c3:e3:37:c6:e4:07:ff:b4:
    7c:2f:27:67:b0:d1:64:d8:a1:e9:af:95:f6:48:1b:
    f8:d9:ed:fb:2e:39:04:b2:52:92:68:c4:60:25:6f:
    af:d0:a6:77:d2:98:98:f1:0b:1d:15:12:8a:69:58:
    39:fc:08:ed:d5:84:e8:33:56:15:b1:d1:d7:27:7b:
    e6:5c:53:2d:ca:92:dd:c7:05:03:74:86:8b:11:7e:
    a9:15:49:14:ef:92:92:b8:44:3f:13:69:6e:4f:ad:
    50:de:d6:bd:90:e5:a6:f7:ed:33:be:2e:ce:31:c6:
    dd:7a:42:53:ee:6c:dc:56:78:7d:dd:1d:5c:d7:76:
    61:40:22:db:87:d0:3b:b2:2f:23:28:5b:5a:31:67:
    af:8d:ac:ab:be:a4:00:04:47:13:37:d3:78:1e:8c:
    5c:ca:0e:a5:e2:77:99:b5:10:e4:ef:93:8c:61:ca:
    a6:0d
Exponent: 65537 (0x10001)

Generating key pairs with OpenSSL

If you want to generate an RSA private key you can do it with OpenSSL

$ openssl genpkey -algorithm RSA -out private.pem -pkeyopt rsa_keygen_bits:2048
......................................................................+++
..........+++

Since OpenSSL is a collection of modules we specify genpkey to generate a private key. The option -algorithm specifies which algorithm we want to use to generate the key (RSA in this case), -out specifies the name of the output file, and -pkeyopt allows us to set the value for specific key options. In this case the length of the RSA key in bits.

If you want an encrypted key you can generate one specifying the cipher (for example -aes-256-cbc)

$ openssl genpkey -algorithm RSA -out private-enc.pem -aes-256-cbc -pkeyopt rsa_keygen_bits:2048
...........................+++
..........+++
Enter PEM pass phrase:
Verifying - Enter PEM pass phrase:

You can see the list of supported ciphers with openssl list-cipher-algorithms. In both cases you can then extract the public key with the method shown previously. OpenSSL private keys are created using PKCS #8, so unencrypted keys will be in the form

-----BEGIN PRIVATE KEY-----
[...]
-----END PRIVATE KEY-----

and encrypted ones in the form

-----BEGIN ENCRYPTED PRIVATE KEY-----
[...]
-----END ENCRYPTED PRIVATE KEY-----

Generating key pairs with OpenSSH

Another tool that you can use to generate key pairs is ssh-keygen, which is a tool included in the SSH suite that is specifically used to create and manage SSH keys. As SSH keys are standard asymmetrical keys we can use the tool to create keys for other purposes.

To create a key pair just run

ssh-keygen -m PEM -t rsa -b 2048 -f key

The option -m specifies the key format. By default OpenSSH uses its own format specified in RFC 4716 ("The Secure Shell (SSH) Public Key File Format".

The option -t specifies the key generation algorithm (RSA in this case), while the option -b specifies the length of the key in bits.

The option -f sets the name of the output file. If not present, ssh-keygen will ask the name of the file, offering to save it to the default file ~/.ssh/id_rsa. The tool always asks for a password to encrypt the key, but you are allowed to enter an empty one to skip the encryption.

This tool creates two files. One is the private key file, named as requested, and the second is the public key file, named like the private key one but with the extension .pub.

The value PEM specified for the option -m writes the private key using the PKCS #1 format, so the key will be in the form

-----BEGIN RSA PRIVATE KEY-----
[...]
-----END RSA PRIVATE KEY-----

Using -m PKCS8 instead uses PKCS #8 and the kwy will be in the form

-----BEGIN PRIVATE KEY-----
[...]
-----END PRIVATE KEY-----

The OpenSSH public key format

The public key saved by ssh-keygen is written in the so-called SSH-format, which is not a standard in the cryptography world. It's structure is <algorithm> <key> <comment>, where the <key> part of the format is encoded with Base64.

For example

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCy9f0/nwkXESzkL4v4ftZ24VJYvkQ/Nt6vsLab3iSWtJXqrRsBythCcbAU6W9
5OGxjbTSFFtp0poqMcPuogocMR7QhjY9JGG3fcnJ7nYDCGRHD4zfG5Af/tHwvJ2ew0WTYoemvlfZIG/jZ7fsuOQSyUpJoxGAlb6
/QpnfSmJjxCx0VEoppWDn8CO3VhOgzVhWx0dcne+ZcUy3Kkt3HBQN0hosRfqkVSRTvkpK4RD8TaW5PrVDe1r2Q5ab37TO+Ls4xx
t16QlPubNxWeH3dHVzXdmFAItuH0DuyLyMoW1oxZ6+NrKu+pAAERxM303gejFzKDqXid5m1EOTvk4xhyqYN user@host

To manually decode the central part of the key you can run the following code

cat key.pub | cut -d " " -f2 | base64 -d | hexdump -ve '/1 "%02x "' -e '2/8 "\n"'

which in the previous case outputs something like

00 00 00 07 73 73 68 2d 72 73 61 00 00 00 03 01
00 01 00 00 01 01 00 b2 f5 fd 3f 9f 09 17 11 2c
e4 2f 8b f8 7e d6 76 e1 52 58 be 44 3f 36 de af
b0 b6 9b de 24 96 b4 95 ea ad 1b 01 ca d8 42 71
b0 14 e9 6f 79 38 6c 63 6d 34 85 16 da 74 a6 8a
8c 70 fb a8 82 87 0c 47 b4 21 8d 8f 49 18 6d df
72 72 7b 9d 80 c2 19 11 c3 e3 37 c6 e4 07 ff b4
7c 2f 27 67 b0 d1 64 d8 a1 e9 af 95 f6 48 1b f8
d9 ed fb 2e 39 04 b2 52 92 68 c4 60 25 6f af d0
a6 77 d2 98 98 f1 0b 1d 15 12 8a 69 58 39 fc 08
ed d5 84 e8 33 56 15 b1 d1 d7 27 7b e6 5c 53 2d
ca 92 dd c7 05 03 74 86 8b 11 7e a9 15 49 14 ef
92 92 b8 44 3f 13 69 6e 4f ad 50 de d6 bd 90 e5
a6 f7 ed 33 be 2e ce 31 c6 dd 7a 42 53 ee 6c dc
56 78 7d dd 1d 5c d7 76 61 40 22 db 87 d0 3b b2
2f 23 28 5b 5a 31 67 af 8d ac ab be a4 00 04 47
13 37 d3 78 1e 8c 5c ca 0e a5 e2 77 99 b5 10 e4
ef 93 8c 61 ca a6 0d

The structure of this binary file is pretty simple, and is described in two different RFCs. RFC 4253 ("SSH Transport Layer Protocol") states in section 6.6 that

The "ssh-rsa" key format has the following specific encoding:

      string    "ssh-rsa"
      mpint     e
      mpint     n

while the definition of the types string and mpint can be found in RFC 4251 ("SSH Protocol Architecture"), section 5

string

    [...] They are stored as a uint32 containing its length
    (number of bytes that follow) and zero (= empty string) or more
    bytes that are the value of the string.  Terminating null
    characters are not used. [...]

mpint

    Represents multiple precision integers in two's complement format,
    stored as a string, 8 bits per byte, MSB first. [...]

This means that the above sequence of bytes is interpreted as 4 bytes of length (32 bits of the type uint32) followed by that number of bytes of content.

(4 bytes)   00 00 00 07          = 7
(7 bytes)   73 73 68 2d 72 73 61 = "ssh-rsa" (US-ASCII)
(4 bytes)   00 00 00 03          = 3
(3 bytes)   01 00 01             = 65537 (a common value for the RSA exponent)
(4 bytes)   00 00 01 01          = 257
(257 bytes) 00 b2 .. ca a6 0d    = The key modulus

Please note that since we created a key of 2048 bits we should have a modulus of 256 bytes. Instead this key uses 257 bytes prefixing the number with a byte 00 to avoid it being interpreted as negative (two's complement format).

The structure shown above is the reason why all the RSA public SSH keys start with the same 12 characters AAAAB3NzaC1y. This string, converted in Base64 gives the initial 9 bytes 00 00 00 07 73 73 68 2d 72 (Base64 characters are not a one-to-one mapping of the source bytes). If the exponent is the standard 65537 the key starts with AAAAB3NzaC1yc2EAAAADAQAB, which encoded gives the fist 18 bytes 00 00 00 07 73 73 68 2d 72 73 61 00 00 00 03 01 00 01.

Converting between PEM and OpenSSH format

We often need to convert files created with one tool to a different format, so this is a list of the most common conversions you might need. I prefer to consider the key format instead of the source tool, but I give a short description of the reason why you should want to perform the conversion.

PEM/PKCS#1 to PEM/PKCS#8

This is useful to convert OpenSSH private keys to a newer format.

openssl pkcs8 -topk8 -inform PEM -outform PEM -in pkcs1.pem -out pkcs8.pem

OpenSSH public to PEM/PKCS#8

To convert public OpenSSH keys in a PEM format using PKCS #8 (prints to stdout)

ssh-keygen -e -f public.pub -m PKCS8

This is easy to remember because -e stands for export. Note that you can also use -m PEM to convert the key into a PEM format that uses PKCS #1.

PEM/PKCS#8 to OpenSSH public

If you need to use in SSH a key pair created with another system

ssh-keygen -i -f public.pem -m PKCS8

This is easy to remember because -i stands for import. As happened when exporting the key, you can import a PEM/PKCS #1 key using -m PEM.

Reading RSA keys in Python

In Python you can use the package pycrypto to access a PEM file containing an RSA key with the function RSA.importKey. Now you can hopefully understand the documentation that says

externKey (string) - The RSA key to import, encoded as a string.

An RSA public key can be in any of the following formats:
    * X.509 subjectPublicKeyInfo DER SEQUENCE (binary or PEM encoding)
    * PKCS#1 RSAPublicKey DER SEQUENCE (binary or PEM encoding)
    * OpenSSH (textual public key only)

An RSA private key can be in any of the following formats:
    * PKCS#1 RSAPrivateKey DER SEQUENCE (binary or PEM encoding)
    * PKCS#8 PrivateKeyInfo DER SEQUENCE (binary or PEM encoding)
    * OpenSSH (textual public key only)

For details about the PEM encoding, see RFC1421/RFC1423.

In case of PEM encoding, the private key can be encrypted with DES or 3TDES
according to a certain pass phrase. Only OpenSSL-compatible pass phrases are
supported.

In practice what you can do with a file private.pem is

from Crypto.PublicKey import RSA

f = open('private.pem', 'r')
key = RSA.importKey(f.read())

and the variable key will contain an instance of _RSAobj (not a very pythonic name, to be honest). This instance contains the RSA parameters as attributes as stated in the documentation

modulus = key.n
public_exponent = key.e
private_exponent = key.d
first_prime_number = key.p
second_prime_number = key.q
q_inv_crt = key.u

Final words

I keep finding on StackOverflow (and on other boards) messages of users that are confused by RSA keys, the output of the various tools, and by the subtle but important differences between the formats, so I hope this post helped you to get a better understanding of the matter.

Resources

The Wikipedia article on RSA
OpenSSL documentation: asn1parse, rsa, genpkey
The Base64 encoding
The Abstract Syntax Notation One ASN.1 interface description language
RFC 4251 - The Secure Shell (SSH) Protocol Architecture
RFC 4253 - The Secure Shell (SSH) Transport Layer Protocol
RFC 4716 - The Secure Shell (SSH) Public Key File Format
RFC 5208 - Public-Key Cryptography Standards (PKCS) #8: Private-Key Information Syntax Specification Version 1.2
RFC 5958 - Asymmetric Key Packages
RFC 7468 - Textual Encodings of PKIX, PKCS, and CMS Structures
RFC 8017 - PKCS #1: RSA Cryptography Specifications Version 2.2
PyCrypto - The Python Cryptography Toolkit

This article was originally published on The Digital Cat

TDD in Python with pytest

Leonardo Giordani — Sun, 08 Nov 2020 11:01:39 +0000

Photo by Moritz Mentges on Unsplash

This series of posts has been published on The Digital Cat

Part 1 (this post) and part 2 contain a detailed example of unit testing.
Part 3 discusses how you should write unit tests and what you should test.
Part 4 introduces mock objects and shows how to use them practically.
Part 5 extends mocks with patches and show how to deal with the most common test cases.

Introduction

Test-Driven Development (TDD) is fortunately one of the names that I can spot most frequently when people talk about methodologies. Unfortunately, many programmers still do not follow it, fearing that it will impose a further burden on the already difficult life of the developer.

In this chapter I will try to outline the basic concept of TDD and to show you how your job as a programmer can greatly benefit from it. I will develop a very simple project to show how to practically write software following this methodology.

TDD is a methodology, something that can help you to create better code. But it is not going to solve all your problems. As with all methodologies you have to pay attention not to commit blindly to it. Try to understand the reasons why certain practices are suggested by the methodology and you will also understand when and why you can or have to be flexible.

Keep also in mind that testing is a broader concept that doesn't end with TDD, which focuses a lot on unit testing, a specific type of test that helps you to develop the API of your library/package. There are other types of tests, like integration or functional ones, that are not specifically part of the TDD methodology, strictly speaking, even though the TDD approach can be extended to any testing activity.

A real-life example

Let's start with a simple example taken from a programmer's everyday life.

The programmer is in the office with other colleagues, trying to nail down an issue in some part of the software. Suddenly the boss storms into the office, and addresses the programmer:

Boss: I just met with the rest of the board. Our clients are not happy, we didn't fix enough bugs in the last two months.

Programmer: I see. How many bugs did we fix?

Boss: Well, not enough!

Programmer: OK, so how many bugs do we have to fix every month?

Boss: More!

I guess you feel very sorry for the poor programmer. Apart from the aggressive attitude of the boss, what is the real issue in this conversation? At the end of it there is no hint for the programmer and their colleagues about what to do next. They don't have any clue about what they have to change. They can definitely try to work harder, but the boss didn't refer to actual figures, so it will be definitely hard for the developers to understand if they improved "enough".

The classical sorites paradox may help to understand the issue. One of the standard formulations, taken from the Wikipedia page, is

1,000,000 grains of sand is a heap of sand (Premise 1)

A heap of sand minus one grain is still a heap. (Premise 2)

So 999,999 grains is a heap of sand.

A heap of sand minus one grain is still a heap. (Premise 2)

So 999,998 grains is a heap of sand.

...

So one grain is a heap of sand.

Where is the issue? The concept expressed by the word "heap" is nebulous, it is not defined clearly enough to allow the process to find a stable point, or a solution.

When you write software you face that same challenge. You cannot conceive a function and just expect it "to work", because this is not clearly defined. How do you test if the function that you wrote "works"? What do you mean by "works"? TDD forces you to clearly state your goal before you write the code. Actually, the TDD mantra is "Test first, code later", which can be translated to "Goal first, solution later". Will shortly see a practical example of this.

For the time being, consider that this is a valid practice also outside the realm of software creation. Whoever runs a business knows that you need to be able to extract some numbers (KPIs) from the activity of your company, because it is by comparing those numbers with some predefined thresholds that you can easily tell if the business is healthy or not. KPIs are a form of test, and you have to define them in advance, according to the expectations or needs that you have.

Pay attention. Nothing prevents you from changing the thresholds as a reaction to external events. You may consider that, given the incredible heat wave that hit your country, the amount of coats that your company sold could not reach the goal. So, because of a specific event, you can justify a change in the test (KPI). If you didn't have the test you would have just generically recorded that you earned less money.

Going back to software and TDD, following this methodology you are forced to state clear goals like

sum(4, 5) == 9

Let me read this test for you: there will be a sum function available in the system that accepts two integers. If the two integers are 4 and 5 the function will return 9.

As you can see there are many things that are tested by this statement.

The function exists and can be imported
The function accepts two integers
Passing 4 and 5 as inputs, the output of the function will be 9.

Pay attention that at this stage there is no code that implements the function sum, the tests will fail for sure.

As we will see with a practical example in the next chapter, what I explained in this section will become a set of rules of the methodology.

A simple TDD project

The project we are going to develop is available at https://github.com/lgiordani/simple_calculator.

This project is purposefully extremely simple. You don't need to be an experienced Python programmer to follow this chapter, but you need to know the basics of the language. The goal of this series of posts is not that of making you write the best Python code, but that of allowing you learn the TDD work flow, so don't be too worried if your code is not perfect.

Methodologies are like sports or arts: you cannot learn them just by reading their description on a book. You have to practice them. Thus, you should avoid as much as possible to just follow this chapter reading the code passively. Instead, you should try to write the code and to try new solutions to the problems that I discuss. This is very important, as it actually makes you use TDD. This way, at the end of the chapter you will have a personal experience of what TDD is like.

The repository is tagged, and at the end of each section you will find a link to the relative tag that contains the working solution.

Setup the project

Clone the project repository and move to the branch develop. The branch master contains the full solution, and I use it to maintain the repository, but if you want to code along you need to start from scratch. If you prefer, you can clearly clone it on GitHub and make your own copy of the repository.

git clone https://github.com/lgiordani/simple_calculator
cd simple_calculator
git checkout --track origin/develop

Create a virtual environment following your preferred process and install the requirements

pip install -r requirements/dev.txt

You should at this point be able to run

pytest -svv

and get an output like

=============================== test session starts ===============================
platform linux -- Python XXXX, pytest-XXXX, py-XXXX, pluggy-XXXX --
cabook/venv3/bin/python3
cachedir: .cache
rootdir: cabook/code/calc, inifile: pytest.ini
plugins: cov-XXXX
collected 0 items 

============================== no tests ran in 0.02s ==============================

Requirements

The goal of the project is to write a class SimpleCalculator that performs calculations: addition, subtraction, multiplication, and division. Addition and multiplication shall accept multiple arguments. Division shall return a float value, and division by zero shall return the string "inf". Multiplication by zero must raise a ValueError exception. The class will also provide a function to compute the average of an iterable like a list. This function gets two optional upper and lower thresholds and should remove from the computation the values that fall outside these boundaries.

As you can see the requirements are pretty simple, and a couple of them are definitely not "good" requirements, like the behaviour of division and multiplication. I added those requirements for the sake of example, to show how to deal with exceptions when developing in TDD.

Step 1 - Adding two numbers

The first test we are going to write is one that checks if the class SimpleCalculator can perform an addition. Add the following code to the file tests/test_main.py

from simple_calculator.main import SimpleCalculator

def test_add_two_numbers():
    calculator = SimpleCalculator()

    result = calculator.add(4, 5)

    assert result == 9

As you can see the first thing we do is to import the class SimpleCalculator that we are supposed to write. This class doesn't exist yet, don't worry, you didn't skip any passage.

The test is a standard function (this is how pytest works), and the function name shall begin with test_ so that pytest can automatically discover all the tests. I tend to give my tests a descriptive name, so it is easier later to come back and understand what the test is about with a quick glance. You are free to follow the style you prefer but in general remember that naming components in a proper way is one of the most difficult things in programming. So better to get a handle on it as soon as possible.

The body of the test function is pretty simple. The class SimpleCalculator is instantiated, and the method add of the instance is called with two numbers, 4 and 5. The result is stored in the variable result, which is later the subject of the test itself. The statement assert result == 9 first computes result == 9 which is a boolean, with a value that is either True or False. The keyword assert, then, silently passes if the argument is True, but raises an exception if it is False.

And this is how you write tests in pytest: if your code doesn't raise any exception the test passes, otherwise it fails. The keyword assert is used to force an exception in case of wrong result. Remember that pytest doesn't consider the return value of the function, so it can detect a failure only if it raises an exception.

Save the file and go back to the terminal. Execute py.test -svv and you should receive the following error message

===================================== ERRORS ======================================
_______________________ ERROR collecting tests/test_main.py _______________________

[...]

tests/test_main.py:4: in <module>
    from simple_calculator.main import SimpleCalculator
E   ImportError: cannot import name 'SimpleCalculator' from 'simple_calculator.main'
!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!!!
============================= 1 error in 0.20 seconds =============================

No surprise here, actually, as we just tried to use something that doesn't exist. This is good, the test is showing us that something we suppose exists actually doesn't.

TDD rule number 1: Test first, code later

This, by the way, is not yet an error in a test. The error happens very soon, during the tests collection phase (as shown by the message in the bottom line Interrupted: 1 errors during collection). Given this, the methodology is still valid, as we wrote a test and it fails because of an error or a missing feature in the code.

Let's fix this issue. Open the file simple_calculator/main.py and add this code

class SimpleCalculator:
    pass

But, I hear you scream, this class doesn't implement any of the requirements that are in the project. Yes, this is the hardest lesson you have to learn when you start using TDD. The development is ruled by the tests, not by the requirements. The requirements are used to write the tests, the tests are used to write the code. You shouldn't worry about something that is more than one level above the current one.

TDD rule number 2: Add the reasonably minimum amount of code you need to pass the tests

Run the test again, and this time you should receive a different error, that is

=============================== test session starts ===============================
platform linux -- Python XXXX, pytest-XXXX, py-XXXX, pluggy-XXXX --
cachedir: .pytest_cache
rootdir: simple_calculator, inifile: pytest.ini
plugins: cov-XXXX
collected 1 item

tests/test_main.py::test_add_two_numbers FAILED

==================================== FAILURES =====================================
______________________________ test_add_two_numbers _______________________________


    def test_add_two_numbers():
        calculator = SimpleCalculator()

>       result = calculator.add(4, 5)
E       AttributeError: 'SimpleCalculator' object has no attribute 'add'

tests/test_main.py:9: AttributeError
============================ 1 failed in 0.04 seconds =============================

This is the first proper pytest failure report that we receive, so it's time to learn how to read the output. The first lines show you general information about the system where the tests are run

=============================== test session starts ===============================
platform linux -- Python XXXX, pytest-XXXX, py-XXXX, pluggy-XXXX --
cachedir: .pytest_cache
rootdir: simple_calculator, inifile: pytest.ini
plugins: cov-XXXX

You can see here the operating system and a short list of the versions of the main packages involved in running pytest: Python, pytest itself, py (https://py.readthedocs.io/en/latest/) and pluggy (https://pluggy.readthedocs.io/en/latest/). You can also see here where pytest is reading its configuration from (pytest.ini), and the pytest plugins that are installed. As this header is standard I will omit it from the output I will show in the rest of the chapter.

The second part of the output shows the list of files containing tests and the result of each test

collected 1 item

tests/test_main.py::test_add_two_numbers FAILED

Please note that this list is formatted with a syntax that can be given directly to pytest to run a single test. In this case we already have only one test, but later you might run a single failing test giving the name shown here on the command line. For example

pytest -svv tests/test_main.py::test_add_two_numbers

The third part of the output shows details on the failing tests, if any

______________________________ test_add_two_numbers _______________________________

    def test_add_two_numbers():
        calculator = SimpleCalculator()

>       result = calculator.add(4, 5)
E       AttributeError: 'SimpleCalculator' object has no attribute 'add'

tests/test_main.py:9: AttributeError

For each failing test, pytest shows a header with the name of the test and the part of the code that raised the exception. At the end of each box, pytest shows the line of the test file where the error happened.

Back to the project. The new error is no surprise, as the test uses the method add that wasn't defined in the class. I bet you already guessed what I'm going to do, didn't you? This is the code that you should add to the class

class SimpleCalculator:
    def add(self):
        pass

And again, as you notice, we made the smallest possible addition to the code to pass the test. Running pytest again you should receive a different error message

_______________________________ test_add_two_numbers _______________________________

    def test_add_two_numbers():
        calculator = SimpleCalculator()

>       result = calculator.add(4, 5)
E       TypeError: add() takes 1 positional argument but 3 were given

tests/test_main.py:9: TypeError

The function we defined doesn't accept any argument other than self (def add(self)), but in the test we pass three of them (calculator.add(4, 5). Remember that in Python self is implicit. Our move at this point is to change the function to accept the parameters that it is supposed to receive, namely two numbers. The code now becomes

class SimpleCalculator:
    def add(self, a, b):
        pass

Run the test again, and you will receive another error

______________________________ test_add_two_numbers ________________________________

    def test_add_two_numbers():
        calculator = SimpleCalculator()

        result = calculator.add(4, 5)

>       assert result == 9
E       assert None == 9
E         -None
E         +9

tests/test_main.py:11: AssertionError

The function returns None, as it doesn't contain any code, while the test expects it to return 9. What do you think is the minimum code you can add to pass this test?

Well, the answer is

class SimpleCalculator:
    def add(self, a, b):
        return 9

and this may surprise you (it should!). You might have been tempted to add some code that performs an addition between a and b, but this would violate the TDD principles, because you would have been driven by the requirements and not by the tests.

When you run pytest again, you will be rewarded by a success message

tests/test_main.py::test_add_two_numbers PASSED

I know this sound weird, but think about it for a moment: if your code works (that is, it passes the tests), you don't need anything more, as your tests should specify everything the code should do. Maybe in the future you will discover that this solution is not good enough, and at that point you will have to change it (this will happen with the next test, in this case). But for now everything works, and you shouldn't implement more than this.

Git tag: step-1-adding-two-numbers

Step 2 - Adding three numbers

The requirements state that "Addition and multiplication shall accept multiple arguments". This means that we should be able to execute not only add(4, 5) like we did, but also add(4, 5, 11), add(4, 5, 11, 2), and so on. We can start testing this behaviour with the following test, that you should put in tests/test_main.py, after the previous test that we wrote.

def test_add_three_numbers():
    calculator = SimpleCalculator()

    result = calculator.add(4, 5, 6)

    assert result == 15

This test fails when we run the test suite

_____________________________ test_add_three_numbers _______________________________

    def test_add_three_numbers():
        calculator = SimpleCalculator()

>       result = calculator.add(4, 5, 6)
E       TypeError: add() takes 3 positional arguments but 4 were given

tests/test_main.py:18: TypeError

for the obvious reason that the function we wrote in the previous section accepts only 2 arguments other than self. What is the minimum code that you can write to fix this test?

Well, the simplest solution is to add another argument, so my first attempt is

class SimpleCalculator:
    def add(self, a, b, c):
        return 9

which solves the previous error, but creates a new one. If that wasn't enough, it also makes the first test fail!

______________________________ test_add_two_numbers ________________________________

    def test_add_two_numbers():
        calculator = SimpleCalculator()

>       result = calculator.add(4, 5)
E       TypeError: add() missing 1 required positional argument: 'c'

tests/test_main.py:10: TypeError
_____________________________ test_add_three_numbers _______________________________

    def test_add_two_numbers():
        calculator = SimpleCalculator()

>       result = calculator.add(4, 5)
E       TypeError: add() missing 1 required positional argument: 'c'

tests/test_main.py:10: TypeError

The first test now fails because the new add method requires three arguments and we are passing only two. The second tests fails because the method add returns 9 and not 15 as expected by the test.

When multiple tests fail it's easy to feel discomforted and lost. Where are you supposed to start fixing this? Well, one possible solution is to undo the previous change and to try a different solution, but in general you should try to get to a situation in which only one test fails.

TDD rule number 3: You shouldn't have more than one failing test at a time

This is very important as it allows you to focus on one single test and thus one single problem. And remember, commenting tests to make them inactive is a perfectly valid way to have only one failing test. Pytest, however, has a smarter solution: you can use the option -k that allows you to specify a matching name. That option has a lot of expressive power, but for now we can just give it the name of the test that we want to run

pytest -svv -k test_add_two_numbers

which will run only the first test and return the same result returned before, since we didn't change the test itself

______________________________ test_add_two_numbers ________________________________

    def test_add_two_numbers():
        calculator = SimpleCalculator()

>       result = calculator.add(4, 5)
E       TypeError: add() missing 1 required positional argument: 'c'

tests/test_main.py:10: TypeError

To fix this error we can obviously revert the addition of the third argument, but this would mean going back to the previous solution. Obviously, though tests focus on a very small part of the code, we have to keep in mind what we are doing in terms of the big picture. A better solution is to add to the third argument a default value. The additive identity is 0, so the new code of the method add is

class SimpleCalculator:
    def add(self, a, b, c=0):
        return 9

And this makes the first test pass. At this point we can run the full suite and see what happens.

_____________________________ test_add_three_numbers ______________________________

    def test_add_three_numbers():
        calculator = SimpleCalculator()

        result = calculator.add(4, 5, 6)

>       assert result == 15
E       assert 9 == 15
E         -9
E         +15

tests/test_main.py:20: AssertionError

The second test still fails, because the returned value that we hard coded doesn't match the expected one. At this point the tests show that our previous solution (return 9) is not sufficient anymore, and we have to try to implement something more complex.

I want to stress this. You should implement the minimal change in the code that makes tests pass, if that solution is not enough there will be a test that shows it. Now, as you can see, the addition of a new requirement changes the tests, adding a new one, and the old solution is not sufficient any more.

How can we solve this? We know that writing return 15 will make the first test fail (you may try, if you want), so here we have to be a bit smarter and try a better solution, that in this case is actually to implement a real sum

class SimpleCalculator:
    def add(self, a, b, c=0):
        return a + b + c

This solution makes both tests pass, so the entire suite runs without errors.

Git tag: step-2-adding-three-numbers

I can see your face, your are probably frowning at the fact that it took us 10 minutes to write a method that performs the addition of two or three numbers. On the one hand, keep in mind that I'm going at a very slow pace, this being an introduction, and for these first tests it is better to take the time to properly understand every single step. Later, when you will be used to TDD, some of these steps will be implicit. On the other hand, TDD is slower than untested development, but the time that you invest writing tests now is usually negligible compared to the amount of time you would spend trying to indentify and fix bugs later.

Step 3 - Adding multiple numbers

The requirements are not yet satisfied, however, as they mention "multiple" numbers and not just three. How can we test that we can add a generic amount of numbers? We might add a test_add_four_numbers, a test_add_five_numbers, and so on, but this will cover specific cases and will never cover all of them. Sad to say, it is impossible to test that generic condition, or, at least in this case, so complex that it is not worth trying to do it.

What you shall do in TDD is to test boundary cases. In general you should always try to find the so-called "corner cases" of your algorithm and write tests that show that the code covers them. For example, if you are testing some code that accepts as inputs a number from 1 to 100, you need a test that runs it with a generic number like 42 (which is far from being generic, but don't panic!), but you definitely want to have a specific test that runs the algorithm with the number 1 and one that runs with the number 100. You also want to have tests that show the algorithm doesn't work with 0 and with 101, but we will talk later about testing error conditions.

In our example there is no real limitation to the number of arguments that you pass to your function. Before Python 3.7 there was a limit of 256 arguments, which has been removed in that version of the language, but these are limitations enforced by an external system, and they are not real boundaries of your algorithm.

The definition of "external system" obviously depends on what you are testing. If you are implementing a programming language you want to have tests that show how many arguments you can pass to a function, or that check the amount of memory used by certain language features. In this case we accept the Python language as the environment in which we work, so we don't want to test its features.

The solution, in this case, might be to test a reasonable high amount of input arguments, to check that everything works. In particular, we should try to keep in mind that our goal is to devise as much as possible a generic solution. For example, we easily realise that we cannot come up with a function like

    def add(self, a, b, c=0, d=0, e=0, f=0, g=0, h=0, i=0):

as it is not generic, it is just covering a greater amount of inputs (9, in this case, but not 10 or more).

That said, a good test might be the following

def test_add_many_numbers():
    numbers = range(100)

    calculator = SimpleCalculator()

    result = calculator.add(*numbers)

    assert result == 4950

which creates an array (strictly speaking a range, which is an iterable) of all the numbers from 0 to 99. The sum of all those numbers is 4950, which is what the algorithm shall return. The test suite fails because we are giving the function too many arguments

______________________________ test_add_many_numbers _______________________________

    def test_add_many_numbers():
        numbers = range(100)

        calculator = SimpleCalculator()

>       result = calculator.add(*numbers)
E       TypeError: add() takes from 3 to 4 positional arguments but 101 were given

tests/test_main.py:28: TypeError

The minimum amount of code that we can add, this time, will not be so trivial, as we have to pass three tests. This is actually the greatest advantage of TDD: the tests that we wrote are still there and will check that the previous conditions are still satisfied. And since tests are committed with the code they will always be there.

The Python way to support a generic number of arguments (technically called variadic functions) is through the use of the syntax *args, which stores in args a tuple that contains all the arguments.

class SimpleCalculator:
    def add(self, *args):
        return sum(args)

At that point we can use the built-in function sum to sum all the arguments. This solution makes the whole test suite pass without errors, so it is correct.

Git tag: step-3-adding-multiple-numbers

Pay attention here, please. In TDD a solution is not correct when it is beautiful, when it is smart, or when it uses the latest feature of the language. All these things are good, but TDD wants your code to pass the tests. So, your code might be ugly, convoluted, and slow, but if it passes the test it is correct. This in turn means that TDD doesn't cover all the needs of your software project. Delivering fast routines, for example, might be part of the advantage you have on your competitors, but it is not really testable with the TDD methodology (typically, performance testing is done in a completely different way).

Part of the TDD methodology, then, deals with "refactoring", which means changing the code in a way that doesn't change the outputs, which in turns means that all your tests keep passing. Once you have a proper test suite in place, you can focus on the beauty of the code, or you can introduce smart solutions according to what the language allows you to do. We will discuss refactoring further later in this post.

TDD rule number 4: Write code that passes the test. Then refactor it.

Step 4 - Subtraction

From the requirements we know that we have to implement a function to subtract numbers, but this doesn't mention multiple arguments (as it would be complex to define what subtracting 3 of more numbers actually means). The tests that implements this requirements is

def test_subtract_two_numbers():
    calculator = SimpleCalculator()

    result = calculator.sub(10, 3)

    assert result == 7

which doesn't pass with the following error

____________________________ test_subtract_two_numbers ____________________________

    def test_subtract_two_numbers():
        calculator = SimpleCalculator()

>       result = calculator.sub(10, 3)
E       AttributeError: 'SimpleCalculator' object has no attribute 'sub'

tests/test_main.py:36: AttributeError

Now that you understood the TDD process, and that you know you should avoid over-engineering, you can also skip some of the passages that we run through in the previous sections. A good solution for this test is

    def sub(self, a, b):
        return a - b

which makes the test suite pass.

Git tag: step-4-subtraction

Step 5 - Multiplication

It's time to move to multiplication, which has many similarities to addition. The requirements state that we have to provide a function to multiply numbers and that this function shall allow us to multiply multiple arguments. In TDD you should try to tackle problems one by one, possibly dividing a bigger requirement in multiple smaller ones.

In this case the first test can be the multiplication of two numbers, as it was for addition.

def test_mul_two_numbers():
    calculator = SimpleCalculator()

    result = calculator.mul(6, 4)

    assert result == 24

And the test suite fails as expected with the following error

______________________________ test_mul_two_numbers _______________________________

    def test_mul_two_numbers():
        calculator = SimpleCalculator()

>       result = calculator.mul(6, 4)
E       AttributeError: 'SimpleCalculator' object has no attribute 'mul'

tests/test_main.py:44: AttributeError

We face now a classical TDD dilemma. Shall we implement the solution to this test as a function that multiplies two numbers, knowing that the next test will invalidate it, or shall we already consider that the target is that of implementing a variadic function and thus use *args directly?

In this case the choice is not really important, as we are dealing with very simple functions. In other cases, however, it might be worth recognising that we are facing the same issue we solved in a similar case and try to implement a smarter solution from the very beginning. In general, however, you should not implement anything that you don't plan to test in one of the next few tests that you will write.

If we decide to follow the strict TDD, that is implement the simplest first solution, the bare minimum code that passes the test would be

    def mul(self, a, b):
        return a * b

Git tag: step-5-multiply-two-numbers

To show you how to deal with redundant tests I will in this case choose the second path, and implement a smarter solution for the present test. Keep in mind however that it is perfectly correct to implement that solution shown above and then move on and try to solve the problem of multiple arguments later.

The problem of multiplying a tuple of numbers can be solved in Python using the function reduce. This function implements a typical algorithm that "reduces" an array to a single number, applying a given function. The algorithm steps are the following

Apply the function to the first two elements
Remove the first two elements from the array
Apply the function to the result of the previous step and to the first element of the array
Remove the first element
If there are still elements in the array go back to step 3

So, suppose the function is

def mul2(a, b):
    return a * b

and the array is

a = [2, 6, 4, 8, 3]

The steps followed by the algorithm will be

Apply the function to 2 and 6 (first two elements). The result is 2 * 6, that is 12
Remove the first two elements, the array is now a = [4, 8, 3]
Apply the function to 12 (result of the previous step) and 4 (first element of the array). The new result is 12 * 4, that is 48
Remove the first element, the array is now a = [8, 3]
Apply the function to 48 (result of the previous step) and 8 (first element of the array). The new result is 48 * 8, that is 384
Remove the first element, the array is now a = [3]
Apply the function to 384 (result of the previous step) and 3 (first element of the array). The new result is 384 * 3, that is 1152
Remove the first element, the array is now empty and the procedure ends

Going back to our class SimpleCalculator, we might import reduce from the module functools and use it on the array args. We need to provide a function that we can define in the function mul itself.

from functools import reduce


class SimpleCalculator:
    [...]

    def mul(self, *args):
        def mul2(a, b):
            return a * b

        return reduce(mul2, args)

Git tag: step-5-multiply-two-numbers-smart

More information about the algorithm reduce can be found on the MapReduce Wikipedia page https://en.wikipedia.org/wiki/MapReduce. The Python function documentation can be found at https://docs.python.org/3.6/library/functools.html#functools.reduce.

The above code makes the test suite pass, so we can move on and address the next problem. As happened with addition we cannot properly test that the function accepts a potentially infinite number of arguments, so we can test a reasonably high number of inputs.

def test_mul_many_numbers():
    numbers = range(1, 10)

    calculator = SimpleCalculator()

    result = calculator.mul(*numbers)

    assert result == 362880

Git tag: step-5-multiply-many-numbers

We might use 100 arguments as we did with addition, but the multiplication of all numbers from 1 to 100 gives a result with 156 digits and I don't really need to clutter the tests file with such a monstrosity. As I said, testing multiple arguments is testing a boundary, and the idea is that if the algorithm works for 2 numbers and for 10 it will work for 10 thousands arguments as well.

If we run the test suite now all tests pass, and this should worry you.

Yes, you shouldn't be happy. When you follow TDD each new test that you add should fail. If it doesn't fail you should ask yourself if it is worth adding that test or not. This is because chances are that you are adding a useless test and we don't want to add useless code, because code has to be maintained, so the less the better.

In this case, however, we know why the test already passes. We implemented a smarter algorithm as a solution for the first test knowing that we would end up trying to solve a more generic problem. And the value of this new test is that it shows that multiple arguments can be used, while the first test doesn't.

So, after these considerations, we can be happy that the second test already passes.

TDD rule number 5: A test should fail the first time you run it. If it doesn't, ask yourself why you are adding it.

Step 6 - Refactoring

Previously, I introduced the concept of refactoring, which means changing the code without altering the results. How can you be sure you are not altering the behaviour of your code? Well, this is what the tests are for. If the new code keeps passing the test suite you can be sure that you didn't remove any feature.

In theory, refactoring shouldn't add any new behaviour to the code, as it should be an idempotent transformation. There is no real practical way to check this, and we will not bother with it now. You should be concerned with this if you are discussing security, as your code shouldn't add any entry point you don't want to be there. In this case you will need tests that check the absence of features instead of their presence.

This means that if you have no tests you shouldn't refactor. But, after all, if you have no tests you shouldn't have any code, either, so refactoring shouldn't be a problem you have. If you have some code without tests (I know you have it, I do), you should seriously consider writing tests for it, at least before changing it. More on this in a later section.

For the time being, let's see if we can work on the code of the class SimpleCalculator without altering the results. I do not really like the definition of the function mul2 inside the function mul. It is obviously perfectly fine and valid, but for the sake of example I will pretend we have to get rid of it.

Python provides support for anonymous functions with the operator lambda, so I might replace the code of mul with

from functools import reduce


class SimpleCalculator:
    [...]

    def mul(self, *args):
        return reduce(lambda x, y: x*y, args)

Git tag: step-6-refactoring

where I define an anonymous function that accepts two inputs x, y and returns their multiplication x*y. Running the test suite I can see that all the test pass, so my refactoring is correct.

TDD rule number 6: Never refactor without tests.

Read the rest of the series on The Digital Cat.

Part 1 (this post) and part 2 contain a detailed example of unit testing.
Part 3 discusses how you should write unit tests and what you should test.
Part 4 introduces mock objects and shows how to use them practically.
Part 5 extends mocks with patches and show how to deal with the most common test cases.

Dissecting a Web stack

Leonardo Giordani — Sun, 01 Nov 2020 12:54:13 +0000

It was gross. They wanted me to dissect a frog.
(Beetlejuice, 1988)

Introduction

Having recently worked with young web developers who were exposed for the first time to proper production infrastructure, I received many questions about the various components that one can find in the architecture of a "Web service". These questions clearly expressed the confusion (and sometimes the frustration) of developers who understand how to create endpoints in a high-level language such as Node.js or Python, but were never introduced to the complexity of what happens between the user's browser and their framework of choice. Most of the times they don't know why the framework itself is there in the first place.

The challenge is clear if we just list (in random order), some of the words we use when we discuss (Python) Web development: HTTP, cookies, web server, Websockets, FTP, multi-threaded, reverse proxy, Django, nginx, static files, POST, certificates, framework, Flask, SSL, GET, WSGI, session management, TLS, load balancing, Apache.

In this post, I want to review all the words mentioned above (and a couple more) trying to build a production-ready web service from the ground up. I hope this might help young developers to get the whole picture and to make sense of these "obscure" names that senior developers like me tend to drop in everyday conversations (sometimes arguably out of turn).

As the focus of the post is the global architecture and the reasons behind the presence of specific components, the example service I will use will be a basic HTML web page. The reference language will be Python but the overall discussion applies to any language or framework.

My approach will be that of first stating the rationale and then implementing a possible solution. After this, I will point out missing pieces or unresolved issues and move on with the next layer. At the end of the process, the reader should have a clear picture of why each component has been added to the system.

The perfect architecture

A very important underlying concept of system architectures is that there is no perfect solution devised by some wiser genius, that we just need to apply. Unfortunately, often people mistake design patterns for such a "magic solution". The "Design Patterns" original book, however, states that

Your design should be specific to the problem at hand but also general enough to address future problems and requirements. You also want to avoid redesign, or at least minimize it.

And later

Design patterns make it easier to reuse successful designs and architectures. [...] Design patterns help you choose design alternatives that make a system reusable and avoid alternatives that compromise reusability.

The authors of the book are discussing Object-oriented Programming, but these sentences can be applied to any architecture. As you can see, we have a "problem at hand" and "design alternatives", which means that the most important thing to understand is the requirements, both the present and future ones. Only with clear requirements in mind, one can effectively design a solution, possibly tapping into the great number of patterns that other designers already devised.

A very last remark. A web stack is a complex beast, made of several components and software packages developed by different programmers with different goals in mind. It is perfectly understandable, then, that such components have some degree of superposition. While the division line between theoretical layers is usually very clear, in practice the separation is often blurry. Expect this a lot, and you will never be lost in a web stack anymore.

Some definitions

Let's briefly review some of the most important concepts involved in a Web stack, the protocols.

TCP/IP

TCP/IP is a network protocol, that is, a set of established rules two computers have to follow to get connected over a physical network to exchange messages. TCP/IP is composed of two different protocols covering two different layers of the OSI stack, namely the Transport (TCP) and the Network (IP) ones. TCP/IP can be implemented on top of any physical interface (Data Link and Physical OSI layers), such as Ethernet and Wireless. Actors in a TCP/IP network are identified by a socket, which is a tuple made of an IP address and a port number.

As far as we are concerned when developing a Web service, however, we need to be aware that TCP/IP is a reliable protocol, which in telecommunications means that the protocol itself takes care or retransmissions when packets get lost. In other words, while the speed of the communication is not granted, we can be sure that once a message is sent it will reach its destination without errors.

HTTP

TCP/IP can guarantee that the raw bytes one computer sends will reach their destination, but this leaves completely untouched the problem of how to send meaningful information. In particular, in 1989 the problem Tim Barners-Lee wanted to solve was how to uniquely name hypertext resources in a network and how to access them.

HTTP is the protocol that was devised to solve such a problem and has since greatly evolved. With the help of other protocols such as WebSocket, HTTP invaded areas of communication for which it was originally considered unsuitable such as real-time communication or gaming.

At its core, HTTP is a protocol that states the format of a text request and the possible text responses. The initial version 0.9 published in 1991 defined the concept of URL and allowed only the GET operation that requested a specific resource. HTTP 1.0 and 1.1 added crucial features such as headers, more methods, and important performance optimisations. At the time of writing the adoption of HTTP/2 is around 45% of the websites in the world, and HTTP/3 is still a draft.

The most important feature of HTTP we need to keep in mind as developers is that it is a stateless protocol. This means that the protocol doesn't require the server to keep track of the state of the communication between requests, basically leaving session management to the developer of the service itself.

Session management is crucial nowadays because you usually want to have an authentication layer in front of a service, where a user provides credentials and accesses some private data. It is, however, useful in other contexts such as visual preferences or choices made by the user and re-used in later accesses to the same website. Typical solutions to the session management problem of HTTP involve the use of cookies or session tokens.

HTTPS

Security has become a very important word in recent years, and with a reason. The amount of sensitive data we exchange on the Internet or store on digital devices is increasing exponentially, but unfortunately so is the number of malicious attackers and the level of damage they can cause with their actions. The HTTP protocol is inherently

HTTP is inherently insecure, being a plain text communication between two servers that usually happens on a completely untrustable network such as the Internet. While security wasn't an issue when the protocol was initially conceived, it is nowadays a problem of paramount importance, as we exchange private information, often vital for people's security or for businesses. We need to be sure we are sending information to the correct server and that the data we send cannot be intercepted.

HTTPS solves both the problem of tampering and eavesdropping, encrypting HTTP with the Transport Layer Security (TLS) protocol, that also enforces the usage of digital certificates, issued by a trusted authority. At the time of writing, approximately 80% of websites loaded by Firefox use HTTPS by default. When a server receives an HTTPS connection and transforms it into an HTTP one it is usually said that it terminates TLS (or SSL, the old name of TLS).

WebSocket

One great disadvantage of HTTP is that communication is always initiated by the client and that the server can send data only when this is explicitly requested. Polling can be implemented to provide an initial solution, but it cannot guarantee the performances of proper full-duplex communication, where a channel is kept open between server and client and both can send data without being requested. Such a channel is provided by the WebSocket protocol.

WebSocket is a killer technology for applications like online gaming, real-time feeds like financial tickers or sports news, or multimedia communication like conferencing or remote education.

It is important to understand that WebSocket is not HTTP, and can exist without it. It is also true that this new protocol was designed to be used on top of an existing HTTP connection, so a WebSocket communication is often found in parts of a Web page, which was originally retrieved using HTTP in the first place.

Implementing a service over HTTP

Let's finally start discussing bits and bytes. The starting point for our journey is a service over HTTP, which means there is an HTTP request-response exchange. As an example, let us consider a GET request, the simplest of the HTTP methods.

GET / HTTP/1.1
Host: localhost
User-Agent: curl/7.65.3
Accept: */*

As you can see, the client is sending a pure text message to the server, with the format specified by the HTTP protocol. The first line contains the method name (GET), the URL (/) and the protocol we are using, including its version (HTTP/1.1). The remaining lines are called headers and contain metadata that can help the server to manage the request. The complete value of the Host header is in this case localhost:80, but as the standard port for HTTP services is 80, we don't need to specify it.

If the server localhost is serving HTTP (i.e. running some software that understands HTTP) on port 80 the response we might get is something similar to

HTTP/1.0 200 OK
Date: Mon, 10 Feb 2020 08:41:33 GMT
Content-type: text/html
Content-Length: 26889
Last-Modified: Mon, 10 Feb 2020 08:41:27 GMT

<!DOCTYPE HTML>
<html>
...
</html>

As happened for the request, the response is a text message, formatted according to the standard. The first line mentions the protocol and the status of the request (200 in this case, that means success), while the following lines contain metadata in various headers. Finally, after an empty line, the message contains the resource the client asked for, the source code of the base URL of the website in this case. Since this HTML page probably contains references to other resources like CSS, JS, images, and so on, the browser will send several other requests to gather all the data it needs to properly show the page to the user.

So, the first problem we have is that of implementing a server that understands this protocol and sends a proper response when it receives an HTTP request. We should try to load the requested resource and return either a success (HTTP 200) if we can find it, or a failure (HTTP 404) if we can't.

1 Sockets and parsers

1.1 Rationale

TCP/IP is a network protocol that works with sockets. A socket is a tuple of an IP address (unique in the network) and a port (unique for a specific IP address) that the computer uses to communicate with others. A socket is a file-like object in an operating system, that can be thus opened and closed, and that we can read from or write to. Socket programming is a pretty low-level approach to the network, but you need to be aware that every software in your computer that provides network access has ultimately to deal with sockets (most probably through some library, though).

Since we are building things from the ground up, let's implement a small Python program that opens a socket connection, receives an HTTP request, and sends an HTTP response. As port 80 is a "low port" (a number smaller than 1024), we usually don't have permissions to open sockets there, so I will use port 8080. This is not a problem for now, as HTTP can be served on any port.

1.2 Implementation

Create the file server.py and type this code. Yes, type it, don't just copy and paste, you will not learn anything otherwise.

import socket

# Create a socket instance
# AF_INET: use IP protocol version 4
# SOCK_STREAM: full-duplex byte stream
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# Allow reuse of addresses
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

# Bind the socket to any address, port 8080, and listen
s.bind(('', 8080))
s.listen()

# Serve forever
while True:
    # Accept the connection
    conn, addr = s.accept()

    # Receive data from this socket using a buffer of 1024 bytes
    data = conn.recv(1024)

    # Print out the data
    print(data.decode('utf-8'))

    # Close the connection
    conn.close()

This little program accepts a connection on port 8080 and prints the received data on the terminal. You can test it executing it and then running curl localhost:8080 in another terminal. You should see something like

$ python3 server.py 
GET / HTTP/1.1
Host: localhost:8080
User-Agent: curl/7.65.3
Accept: */*

The server keeps running the code in the while loop, so if you want to terminate it you have to do it with Ctrl+C. So far so good, but this is not an HTTP server yet, as it sends no response; you should actually receive an error message from curl that says curl: (52) Empty reply from server.

Sending back a standard response is very simple, we just need to call conn.sendall passing the raw bytes. A minimal HTTP response contains the protocol and the status, an empty line, and the actual content, for example

HTTP/1.1 200 OK

Hi there!

Our server becomes then

import socket

# Create a socket instance
# AF_INET: use IP protocol version 4
# SOCK_STREAM: full-duplex byte stream
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# Allow reuse of addresses
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

# Bind the socket to any address, port 8080, and listen
s.bind(('', 8080))
s.listen()

# Serve forever
while True:
    # Accept the connection
    conn, addr = s.accept()

    # Receive data from this socket using a buffer of 1024 bytes
    data = conn.recv(1024)

    # Print out the data
    print(data.decode('utf-8'))

    conn.sendall(bytes("HTTP/1.1 200 OK\n\nHi there!\n", 'utf-8'))

    # Close the connection
    conn.close()

At this point, we are not really responding to the user's request, however. Try different curl command lines like curl localhost:8080/index.html or curl localhost:8080/main.css and you will always receive the same response. We should try to find the resource the user is asking for and send that back in the response content.

This version of the HTTP server properly extracts the resource and tries to load it from the current directory, returning either a success of a failure

import socket
import re

# Create a socket instance
# AF_INET: use IP protocol version 4
# SOCK_STREAM: full-duplex byte stream
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# Allow reuse of addresses
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

# Bind the socket to any address, port 8080, and listen
s.bind(('', 8080))
s.listen()

HEAD_200 = "HTTP/1.1 200 OK\n\n"
HEAD_404 = "HTTP/1.1 404 Not Found\n\n"

# Serve forever
while True:
    # Accept the connection
    conn, addr = s.accept()

    # Receive data from this socket using a buffer of 1024 bytes
    data = conn.recv(1024)

    request = data.decode('utf-8')

    # Print out the data
    print(request)

    resource = re.match(r'GET /(.*) HTTP', request).group(1)
    try:
        with open(resource, 'r') as f:
            content = HEAD_200 + f.read()
        print('Resource {} correctly served'.format(resource))
    except FileNotFoundError:
        content = HEAD_404 + "Resource /{} cannot be found\n".format(resource)
        print('Resource {} cannot be loaded'.format(resource))

    print('--------------------')

    conn.sendall(bytes(content, 'utf-8'))

    # Close the connection
    conn.close()

As you can see this implementation is extremely simple. If you create a simple local file named index.html with this content

<head>
    <title>This is my page</title>
    <link rel="stylesheet" href="main.css">
</head>
<html>
    <p>Some random content</p>
</html>

and run curl localhost:8080/index.html you will see the content of the file. At this point, you can even use your browser to open http://localhost:8080/index.html and you will see the title of the page and the content. A Web browser is a software capable of sending HTTP requests and of interpreting the content of the responses if this is HTML (and many other file types like images or videos), so it can render the content of the message. The browser is also responsible of retrieving the missing resources needed for the rendering, so when you provide links to style sheets or JS scripts with the <link> or the <script> tags in the HTML code of a page, you are instructing the browser to send an HTTP GET request for those files as well.

The output of server.py when I access http://localhost:8080/index.html is

GET /index.html HTTP/1.1
Host: localhost:8080
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-GB,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Pragma: no-cache
Cache-Control: no-cache


Resource index.html correctly served
--------------------
GET /main.css HTTP/1.1
Host: localhost:8080
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0
Accept: text/css,*/*;q=0.1
Accept-Language: en-GB,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Referer: http://localhost:8080/index.html
Pragma: no-cache
Cache-Control: no-cache


Resource main.css cannot be loaded
--------------------
GET /favicon.ico HTTP/1.1
Host: localhost:8080
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0
Accept: image/webp,*/*
Accept-Language: en-GB,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache


Resource favicon.ico cannot be loaded
--------------------

As you can see the browser sends rich HTTP requests, with a lot of headers, automatically requesting the CSS file mentioned in the HTML code and automatically trying to retrieve a favicon image.

1.3 Resources

These resources provide more detailed information on the topics discussed in this section

Python 3 Socket Programming HOWTO
HTTP/1.1 Request format
HTTP/1.1 Response format
The source code of this example is available here

1.4 Issues

It gives a certain dose of satisfaction to build something from scratch and discover that it works smoothly with full-fledged software like the browser you use every day. I also think it is very interesting to discover that technologies like HTTP, that basically run the world nowadays, are at their core very simple.

That said, there are many features of HTTP that we didn't cover with our simple socket programming. For starters, HTTP/1.0 introduced other methods after GET, such as POST that is of paramount importance for today's websites, where users keep sending information to servers through forms. To implement all 9 HTTP methods we need to properly parse the incoming request and add relevant functions to our code.

At this point, however, you might notice that we are dealing a lot with low-level details of the protocol, which is usually not the core of our business. When we build a service over HTTP we believe that we have the knowledge to properly implement some code that can simplify a certain process, be it searching for other websites, shopping for books or sharing pictures with friends. We don't want to spend our time understanding the subtleties of the TCP/IP sockets and writing parsers for request-response protocols. It is nice to see how these technologies work, but on a daily basis, we need to focus on something at a higher level.

The situation of our small HTTP server is possibly worsened by the fact that HTTP is a stateless protocol. The protocol doesn't provide any way to connect two successive requests, thus keeping track of the state of the communication, which is the cornerstone of modern Internet. Every time we authenticate on a website and we want to visit other pages we need the server to remember who we are, and this implies keeping track of the state of the connection.

Long story short: to work as a proper HTTP server, our code should at this point implement all HTTP methods and cookies management. We also need to support other protocols like Websockets. These are all but trivial tasks, so we definitely need to add some component to the whole system that lets us focus on the business logic and not on the low-level details of application protocols.

Web frameworks, concurrency, HTTPS, web servers, AWS, performances... read the full post on The Digital Cat

Photo by Samrat Khadka on Unsplash

Multiple inheritance and mixin classes in Python

Leonardo Giordani — Sat, 25 Apr 2020 10:28:33 +0000

This article was originally published on The Digital Cat.

I recently revisited three old posts on Django class-based views that I wrote for my blog, updating them to Django 3.0 and noticed once again that the code base uses mixin classes to increase code reuse. I also realised that mixins are not very popular in Python, so I decided to explore them, brushing up my knowledge of the OOP theory in the meanwhile.

To fully appreciate the content of the post, be sure you grasp two pillars of the OOP approach: delegation, in particular how it is implemented through inheritance, and polymorphism. This post about delegation and this post about polymorphism contain all you need to understand how Python implements those concepts.

Multiple inheritance: blessing and curse

General concepts

To discuss mixins we need to start from one of the most controversial subjects in the whole OOP world: multiple inheritance. This is a natural extension of the concept of simple inheritance, where a class automatically delegates method and attribute resolution to another class (the parent class).

Let me state it again, as it is important for the rest of the discussion: inheritance is just an automatic delegation mechanism.

Delegation was introduced in OOP as a way to reduce code duplication. When an object needs a specific feature it just delegates it to another class (either explicitly or implicitly), so the code is written just once.

Let's consider the example of code management website, clearly completely fictional and not inspired by any existing product. Let's assume we created the following hierarchy

      assignable reviewable item
 (assign_to_user, ask_review_to_user)
                 ^
                 |
                 |
                 |
            pull request

which allows us to put in pull request only the specific code required by that element. This is a great achievement, as it is what libraries do for code, but on live objects. Method calls and delegation are nothing more than messages between objects, so the delegation hierarchy is just a simple networked system.

Unfortunately, the use of inheritance over composition often leads to systems that, paradoxically, increase code duplication. The main problem lies in the fact that inheritance can directly delegate to only one other class (the parent class), as opposed to composition, where the object can delegate to any number of other ones. This limitation of inheritance means that we might have a class that inherits from another one because it needs some of its features, but doing this receives features it doesn't want, or shouldn't have.

Let's continue the example of the code management portal, and consider an issue, which is an item that we want to store in the system, but cannot be reviewed by a user. If we create a hierarchy like this

      assignable reviewable item
   (assign_to_user, ask_review_to_user)
                   ^
                   |
                   |
                   |
                   |
          +--------+--------+
          |                 |
          |                 |
          |                 |
        issue          pull request
   (not reviewable)

we end up putting the features related to the review process in an object that shouldn't have them. The standard solution to this problem is that of increasing the depth of the inheritance hierarchy and to derive from the new simpler ancestor.

          assignable item
         (assign_to_user)
                 ^
                 |
                 |
                 |
                 |
          +------+--------------+
          |                     |
          |                     |
          |                     |
          |         reviewable assignable item
          |            (ask_review_to_user)
          |                     ^
          |                     |
          |                     |
          |                     |
        issue              pull request

However, this approach stops being viable as soon as an object needs to inherit from a given class but not from the parent of that class. For example, an element that has to be reviewable but not assignable, like a best practice that we want to add to the site. If we want to keep using inheritance, the only solution at this point is to duplicate the code that implements the reviewable nature of the item (or the code that implements the assignable feature) and create two different class hierarchies.

          assignable item              +-------->  reviewable item
         (assign_to_user)              |         (ask_review_to_user)
                 ^                     |                  ^
                 |                     |                  |
                 |                     |                  |
                 |             CODE DUPLICATION           |
                 |                     |                  |
          +------+--------------+      |                  |
          |                     |      |                  |
          |                     |      |                  |
          |                     |      V                  |
          |         reviewable assignable item            |
          |            (ask_review_to_user)               |
          |                     ^                         |
          |                     |                         |
          |                     |                         |
          |                     |                         |
        issue              pull request             best practice

Please note that this doesn't even take into account that the new reviewable item might need attributes from assignable item, which prompts for another level of depth in the hierarchy, where we isolate those features in a more generic class. So, unfortunately, chances are that this is only the first of many compromises we will have to accept to keep the system in a stable state if we can't change our approach.

Multiple inheritance was then introduced in OOP, as it was clear that an object might want to delegate certain actions to a given class, and other actions to a different one, mimicking what life forms do when they inherit traits from multiple ancestors (parents, grandparents, etc.).

The above situation can then be solved having pull request inherit from both the class that provides the assign feature and from the one that implements the reviewable nature.

          assignable item                          reviewable item
         (assign_to_user)                        (ask_review_to_user)
                 ^                                      ^  ^
                 |                                      |  |
                 |                                      |  |
                 |                                      |  |
                 |                                      |  |
          +------+-------------+ +----------------------+  |
          |                    | |                         |
          |                    | |                         |
          |                    | |                         |
          |                    | |                         |
          |                    | |                         |
          |                    | |                         |
          |                    | |                         |
          |                    | |                         |
          |                    | |                         |
        issue              pull request              best practice

Generally speaking, then, multiple inheritance is introduced to give the programmer a way to keep using inheritance without introducing code duplication, keeping the class hierarchy simpler and cleaner. Eventually, everything we do in software design is to try and separate concerns, that is, to isolate features, and multiple inheritance can help to do this.

These are just examples and might be valid or not, depending on the concrete case, but they clearly show the issues that we can have even with a very simple hierarchy of 4 classes. Many of these problems clearly arise from the fact that we wanted to implement delegation only through inheritance, and I dare to say that 80% of the architectural errors in OOP projects come from using inheritance instead of composition and from using god objects, that is classes that have responsibilities over too many different parts of the system. Always remember that OOP was born with the idea of small objects interacting through messages, so the considerations we make for monolithic architectures are valid even here.

That said, as inheritance and composition implement two different types of delegation (to be and to have), they are both valuable, and multiple inheritance is the way to remove the single provider limitation that comes from having only one parent class.

Why is it controversial?

Given what I just said, multiple inheritance seems to be a blessing. When an object can inherit from multiple parents, we can easily spread responsibilities among different classes and use only the ones we need, promoting code reuse and avoiding god objects.

Unfortunately, things are not that simple. First of all, we face the issue that every microservice-oriented architecture faces, that is the risk of going from god objects (the extreme monolithic architecture) to almost empty objects (the extreme distributed approach), burdening the programmer with too a fine-grained control that eventually results in a system where relationships between objects are so complicated that it becomes impossible to grasp the effect of a change in the code.

There is a more immediate problem in multiple inheritance, though. As it happens with the natural inheritance, parents can provide the same "genetic trait" in two different flavours, but the resulting individual will have only one. Leaving aside genetics (which is incredibly more complicated than programming) and going back to OOP, we face a problem when an object inherits from two other objects that provide the same attribute.

So, if your class Child inherits from parents Parent1 and Parent2, and both provide the __init__ method, which one should your object use?

class Parent1():
    def __init__(self):
        [...]


class Parent2():
    def __init__(self):
        [...]


class Child(Parent1, Parent2):
    # This inherits from both Parent1 and Parent2, which __init__ does it use?
    pass

Things can even get worse, as parents can have different signatures of the common method, for example

class Parent1:
    # This inherits from Ancestor but redefines __init__
    def __init__(self, status):
        [...]


class Parent2:
    # This inherits from Ancestor but redefines __init__
    def __init__(self, name):
        [...]


class Child(Parent1, Parent2):
    # This inherits from both Parent1 and Parent2, which __init__ does it use?
    pass

The problem can be extended even further, introducing a common ancestor above Parent1 and Parent2.

class Ancestor:
    # The common ancestor, defines its own __init__ method
    def __init__(self):
        [...]


class Parent1(Ancestor):
    # This inherits from Ancestor but redefines __init__
    def __init__(self, status):
        [...]


class Parent2(Ancestor):
    # This inherits from Ancestor but redefines __init__
    def __init__(self, name):
        [...]


class Child(Parent1, Parent2):
    # This inherits from both Parent1 and Parent2, which __init__ does it use?
    pass

As you can see, we already have a problem when we introduce multiple parents, and a common ancestor just adds a new level of complexity. The ancestor class can clearly be at any point of the inheritance tree (grandparent, grand-grandparent, etc.), the important part is that it is shared between Parent1 and Parent2. This is the so-called diamond problem, as the inheritance graph has the shape of a diamond

      Ancestor
       ^   ^
      /     \
     /       \
Parent1     Parent2
    ^         ^
     \       /
      \     /
       Child

So, while with single-parent inheritance the rules are straightforward, with multiple inheritance we immediately have a more complex situation that doesn't have a trivial solution. Does all this prevent multiple inheritance from being implemented?

Not at all! There are solutions to this problem, as we will see shortly, but this further level of intricacy makes multiple inheritance something that doesn't fit easily in a design and has to be implemented carefully to avoid subtle bugs. Remember that inheritance is an automatic delegation mechanism, as this makes what happens in the code less evident. For these reasons, multiple inheritance is often depicted as scary and convoluted, and usually given some space only in the advanced OOP courses, at least in the Python world. I believe every Python programmer, instead, should familiarise with it and learn how to take advantage of it.

Multiple inheritance: the Python way

Let's see how it is possible to solve the diamond problem. Unlike genetics, we programmers can't afford any level of uncertainty or randomness in our processes, so in the presence of a possible ambiguity as the one created by multiple inheritance, we need to write down a rule that will be strictly followed in every case. In Python, this rule goes by the name of MRO (Method Resolution Order), which was introduced in Python 2.3 and is described in this document by Michele Simionato.

There is a lot to say about MRO and the underlying C3 linearisation algorithm, but for the scope of this post, it is enough to see how it solves the diamond problem. In case of multiple inheritance, Python follows the usual inheritance rules (automatic delegation to an ancestor if the attribute is not present locally), but the order followed to traverse the inheritance tree now includes all the classes that are specified in the class signature. In the example above, Python would look for attributes in the following order: Child, Parent1, Parent2, Ancestor.

So, as in the case of standard inheritance, this means that the first class in the list that implements a specific attribute will be the selected provider for that resolution. An example might clarify the matter

class Ancestor:
    def rewind(self):
        [...]


class Parent1(Ancestor):
    def open(self):
        [...]


class Parent2(Ancestor):
    def open(self):
        [...]

    def close(self):
        [...]

    def flush(self):
        [...]


class Child(Parent1, Parent2):
    def flush(self):
        [...]

In this case an instance c of Child would provide rewind, open, close, and flush. When c.rewind is called, the code in Ancestor is executed, as this is the first class in the MRO list that provides that method. The method open is provided by Parent1, while close is provided by Parent2. If the method c.flush is called, the code is provided by the Child class itself, that redefines it overriding the one provided by Parent2.

As we see with the flush method, Python doesn't change its behaviour when it comes to method overriding with multiple parents. The first implementation of a method with that name is executed, and the parent's implementation is not automatically called. As in the case of standard inheritance, then, it's up to us to design classes with matching method signatures.

Under the bonnet

How does multiple inheritance work internally? How does Python create the MRO list?

Python has a very simple approach to OOP (even though it ultimately ends with a mind-blowing ouroboros, see here. Classes are objects themselves, so they contain data structures that are used by the language to provide features, and delegation makes no exception. When we run a method on an object, Python silently uses the __getattribute__ method (provided by object), which uses __class__ to reach the class from the instance, and __bases__ to find the parent classes. The latter, in particular, is a tuple, so it is ordered, and it contains all the classes that the current class inherits from.

The MRO is created using only __bases__, but the underlying algorithm is not that trivial and has to with the monotonicity of the resulting class linearisation. It is less scary than it sounds, but not something you want to read while suntanning, probably. If that's the case, the aforementioned document by Michele Simionato contains all the gory details on class linearisation that you always wanted to explore while lying on the beach.

Inheritance and interfaces

To approach mixins, we need to discuss inheritance in detail, and specifically the role of method signatures.

In Python, when you override a method provided by an ancestor class, you have to decide if and when to call its original implementation. This gives the programmer the freedom to decide whether they need to just augment a method or to replace it completely. Remember that the only thing Python does when a class inherits from another is to automatically delegate methods that are not implemented.

When a class inherits from another we are ideally creating objects that keep the backward compatibility with the interface of the parent class, to allow a polymorphic use of them. This means that when we inherit from a class and override a method changing its signature we are doing something that is dangerous and, at least from the point of view of polymorphism, wrong. Have a look at this example

class GraphicalEntity:
    def __init__(self, pos_x, pos_y, size_x, size_y):
        self.pos_x = pos_x
        self.pos_y = pos_y
        self.size_x = size_x
        self.size_y = size_y

    def move(self, pos_x, pos_y):
        self.pos_x = pos_x
        self.pos_y = pos_y

    def resize(self, size_x, size_y):
        self.size_x = size_x
        self.size_y = size_y


class Rectangle(GraphicalEntity):
    pass


class Square(GraphicalEntity):
    def __init__(self, pos_x, pos_y, size):
        super().__init__(pos_x, pos_y, size, size)

    def resize(self, size):
        super().resize(size, size)

Please note that Square changes the signature of both __init__ and resize. Now, when we instantiate those classes we need to keep in mind the different signature of __init__ in Square

r1 = Rectangle(100, 200, 15, 30)
r2 = Rectangle(150, 280, 23, 55)
q1 = Square(300, 400, 50)

We usually accept that an enhanced version of a class accepts more parameters when it is initialized, as we do not expect it to be polymorphic on __init__. Problems arise when we try to leverage polymorphism on other methods, for example resizing all GraphicalEntity objects in a list

for shape in [r1, r2, q1]:
    size_x = shape.size_x
    size_y = shape.size_y
    shape.resize(size_x*2, size_y*2)

Since r1, r2, and q1 are all objects that inherit from GraphicalEntity we expect them to provide the interface provided by that class, but this fails, because Square changed the signature of resize. The same would happen if we instantiated them in a for loop from a list of classes, but as I said it is generally accepted that child classes change the signature of the __init__ method. This is not true, for example, in a plugin-based system, where all plugins shall be initialized the same way.

This is a classic problem in OOP. While we, as humans, perceive a square just as a slightly special rectangle, from the interface point of view the two classes are different, and thus should not be in the same inheritance tree when we are dealing with dimensions. This is an important consideration: Rectangle and Square are polymorphic on the move method, but not on __init__ and resize. So, the question is if we could somehow separate the two natures of being movable and resizable.

Now, discussing interfaces, polymorphism, and the reasons behind them would require an entirely separate post, so in the following sections, I'm going to ignore the matter and just consider the object interface optional. You will thus find examples of objects that break the interface of the parent, and objects that keep it. Just remember: whenever you change the signature of a method you change the (implicit) interface of the object, and thus you stop polymorphism. I'll discuss another time if I consider this right or wrong.

Mixin classes

MRO is a good solution that prevents ambiguity, but it leaves programmers with the responsibility of creating sensible inheritance trees. The algorithm helps to resolve complicated situations, but this doesn't mean we should create them in the first place. So, how can we leverage multiple inheritance without creating systems that are too complicated to grasp? Moreover, is it possible to use multiple inheritance to solve the problem of managing the double (or multiple) nature of an object, as in the previous example of a movable and resizable shape?

The solution comes from mixin classes: those are small classes that provide attributes but are not included in the standard inheritance tree, working more as "additions" to the current class than as proper ancestors. Mixins originate in the LISP programming language, and specifically in what could be considered the first version of the Common Lisp Object System, the Flavors extension. Modern OOP languages implement mixins in many different ways: Scala, for example, has a feature called traits, which live in their own space with a specific hierarchy that doesn't interfere with the proper class inheritance.

Mixin classes in Python

Python doesn't provide support for mixins with any dedicated language feature, so we use multiple inheritance to implement them. This clearly requires great discipline from the programmer, as it violates one of the main assumptions for mixins: their orthogonality to the inheritance tree. In Python, so-called mixins are classes that live in the normal inheritance tree, but they are kept small to avoid creating hierarchies that are too complicated for the programmer to grasp. In particular, mixins shouldn't have common ancestors other than object with the other parent classes.

Let's have a look at a simple example

class GraphicalEntity:
    def __init__(self, pos_x, pos_y, size_x, size_y):
        self.pos_x = pos_x
        self.pos_y = pos_y
        self.size_x = size_x
        self.size_y = size_y


class ResizableMixin:
    def resize(self, size_x, size_y):
        self.size_x = size_x
        self.size_y = size_y


class ResizableGraphicalEntity(GraphicalEntity, ResizableMixin):
    pass

Here, the class ResizableMixin doesn't inherit from GraphicalEntity, but directly from object, so ResizableGraphicalEntity gets from it just the resize method. As we said before, this simplifies the inheritance tree of ResizableGraphicalEntity and helps to reduce the risk of the diamond problem. It leaves us free to use GraphicalEntity as a parent for other classes without having to inherit methods that we don't want. Please remember that this happens because the classes are designed to avoid it, and not because of language features: the MRO algorithm just ensures that there will always be an unambiguous choice in case of multiple ancestors.

Mixins cannot usually be too generic. After all, they are designed to add features to classes, but these new features often interact with other pre-existing features of the augmented class. In this case, the resize method interacts with the attributes size_x and size_y that have to be present in the object. Obviously, there are obviously examples of pure mixins, but since they would require no initialization their scope is definitely limited.

Using mixins to hijack inheritance

Thanks to the MRO, Python programmers can leverage multiple inheritance to override methods that objects inherit from their parents, allowing them to customise classes without code duplication. Let's have a look at this example

class GraphicalEntity:
    def __init__(self, pos_x, pos_y, size_x, size_y):
        self.pos_x = pos_x
        self.pos_y = pos_y
        self.size_x = size_x
        self.size_y = size_y

class Button(GraphicalEntity):
    def __init__(self, pos_x, pos_y, size_x, size_y):
        super().__init__(pos_x, pos_y, size_x, size_y)
        self.status = False

    def toggle(self):
        self.status = not self.status

b = Button(10, 20, 200, 100)

As you can see the Button class extends the GraphicalEntity one in a classic way, using super to call the parent's __init__ method before adding the new status attribute. Now, if I wanted to create a SquareButton class I have two choices.

I might just override __init__ in the new class

class GraphicalEntity:
    def __init__(self, pos_x, pos_y, size_x, size_y):
        self.pos_x = pos_x
        self.pos_y = pos_y
        self.size_x = size_x
        self.size_y = size_y


class Button(GraphicalEntity):
    def __init__(self, pos_x, pos_y, size_x, size_y):
        super().__init__(pos_x, pos_y, size_x, size_y)
        self.status = False

    def toggle(self):
        self.status = not self.status


class SquareButton(Button):
    def __init__(self, pos_x, pos_y, size):
        super().__init__(pos_x, pos_y, size, size)

b = SquareButton(10, 20, 200)

which performs the requested job, but strongly connects the feature of having a single dimension with the Button nature. If we wanted to create a circular image we could not inherit from SquareButton, as the image has a different nature.

The second option is that of isolating the features connected with having a single dimension in a mixin class, and add it as a parent for the new class

class GraphicalEntity:
    def __init__(self, pos_x, pos_y, size_x, size_y):
        self.pos_x = pos_x
        self.pos_y = pos_y
        self.size_x = size_x
        self.size_y = size_y


class Button(GraphicalEntity):
    def __init__(self, pos_x, pos_y, size_x, size_y):
        super().__init__(pos_x, pos_y, size_x, size_y)
        self.status = False

    def toggle(self):
        self.status = not self.status


class SingleDimensionMixin:
    def __init__(self, pos_x, pos_y, size):
        super().__init__(pos_x, pos_y, size, size)


class SquareButton(SingleDimensionMixin, Button):
    pass

b = SquareButton(10, 20, 200)

The second solution gives the same final result, but promotes code reuse, as now the SingleDimensionMixin class can be applied to other classes derived from GraphicalEntity and make them accept only one size, while in the first solution that feature was tightly connected with the Button ancestor class.

Please note that the position of the mixin is important. As super follows the MRO, the called method is dispatched to the nearest class in the linearisation. If you put SingleDimensionMixin after Button in the definition of SquareButton, Python would complain. In that case the call b = SquareButton(10, 20, 200) and the method signature __init__(self, pos_x, pos_y, size_x, size_y) would not match.

Mixins are not used only when you want to change the object's interface, though. Leveraging super we can achieve interesting designs like

class GraphicalEntity:
    def __init__(self, pos_x, pos_y, size_x, size_y):
        self.pos_x = pos_x
        self.pos_y = pos_y
        self.size_x = size_x
        self.size_y = size_y


class Button(GraphicalEntity):
    def __init__(self, pos_x, pos_y, size_x, size_y):
        super().__init__(pos_x, pos_y, size_x, size_y)
        self.status = False

    def toggle(self):
        self.status = not self.status


class LimitSizeMixin:
    def __init__(self, size_x, size_y, size):
        size_x = min(size_x, 500)
        size_y = min(size_y, 400)
        super().__init__(pos_x, pos_y, size_x, size_y)


class LimitSizeButton(Button, LimitSizeMixin):
    pass

b = LimitSizeButton(10, 20, 200, 100)

Here, LimitSizeButton calls __init__ of its first parent, which is Button. This, however, delegates the call to the next class in the MRO before initialising self.status, so the call is dispatched to LimitSizeMixin, that first operates some changes and eventually dispatches it to the original recipient, GraphicalEntity.

Remember that in Python, you are never forced to call the parent's implementation of a method, so the mixin here might also stop the dispatching mechanism if that is the requirement of the business logic of the new object.

A real example: Django class-based views

Finally, let's get to the original source of inspiration for this post: the Django codebase. I will show you here how the Django programmers used multiple inheritance and mixin classes to promote code reuse, and you will now hopefully grasp all the reasons behind them.

The example I chose can be found in the code of generic views, and in particular in two classes: TemplateResponseMixin and TemplateView.

As you might know, Django View class is the ancestor of all class-based views and provides a dispatch method that converts HTTP request methods into Python function calls (CODE). Now, the TemplateView is a view that answers to a GET request rendering a template with the data coming from a context passed when the view is called. Given the mechanism behind Django views, then, TemplateView should implement a get method and return the content of the HTTP response. The code of the class is

class TemplateView(TemplateResponseMixin, ContextMixin, View):
    """
    Render a template. Pass keyword arguments from the URLconf to the context.
    """
    def get(self, request, *args, **kwargs):
        context = self.get_context_data(**kwargs)
        return self.render_to_response(context)

As you can see TemplateView is a View, but it uses two mixins to inject features. Let's have a look at TemplateResponseMixin

class TemplateResponseMixin:
    [...]

    def render_to_response(self, context, **response_kwargs):
        [...]

    def get_template_names(self):
        [...]

[I removed the code of the class as it is not crucial for the present discussion, you can see the full class here]

It is clear that TemplateResponseMixin just adds to any class the two methods get_template_names and render_to_response. The latter is called in the get method of TemplateView to create the response. Let's have a look at a simplified schema of the calls:

GET request --> TemplateView.dispatch --> View.dispatch --> TemplateView.get --> TemplateResponseMixin.render_to_response

It might look complicated, but try to follow the code a couple of times and the whole picture will start to make sense. The important thing I want to stress is that the code in TemplateResponseMixin is available for any class that wants to have the feature of rendering a template, for example DetailView (CODE), which receives the feature of showing the details of a single object by SingleObjectTemplateResponseMixin, which inherits from TemplateResponseMixin, overriding its method get_template_names (CODE).

As we discussed before, mixins cannot be too generic, and here we see a good example of a mixin designed to work on specific classes. TemplateResponseMixin has to be applied to classes that contain self.request (CODE), and while this doesn't mean exclusively classes derived from View, it is clear that it has been designed to augment that specific type.

Takeaway points

Inheritance is designed to promote code reuse but can lead to the opposite result
Multiple inheritance allows us to keep the inheritance tree simple
Multiple inheritance leads to possible problems that are solved in Python through the MRO
Interfaces (either implicit or explicit) should be part of your design
Mixin classes are used to add simple changes to classes
Mixins are implemented in Python using multiple inheritance: they have great expressive power but require careful design.

Final words

I hope this post helped you to understand a bit more how multiple inheritance works, and to be less scared by it. I also hope I managed to show you that classes have to be carefully designed and that there is a lot to consider when you create a class system. Once again, please don't forget composition, it's a powerful and too often forgotten tool.

This article was originally published on The Digital Cat.

Digging up Django class-based views - 3

Leonardo Giordani — Wed, 18 Mar 2020 08:19:36 +0000

This post was originally posted on The Digital Cat

In the first two issues of this short series we discussed the basic concepts of class-based views in Django, and started understanding and using two of the basic generic views Django makes available to you: ListView and DetailView. Both are views that read some data from the database and show them on a rendered template. We also briefly reviewed the base views that allow us to build heavily customised views, and date-based views.

This third issue will introduce the reader to the class-based version of Django forms. This post is not meant to be a full introduction to the Django form library; rather, I want to show how class-based generic views implement the CUD part of the CRUD operations (Create, Read, Update, Delete), the Read one being implemented by "standard" generic views.

A very basic example

To start working with CBFs (class-based forms) let's consider a simple example. We have a StickyNote class which represents a simple text note with a date:

class StickyNote(models.Model):
    timestamp = models.DateTimeField()
    text = models.TextField(blank=True, null=True)

One of the first things we usually want to do is to build a form that allows the user to create a new entry in the database, in this case a new sticky note. We can create a page that allows us to input data for a new StickyNote simply creating the following view

class NoteAdd(CreateView):
    model = StickyNote

It is no surprise that the class is mostly empty. Thanks to inheritance, as happened in the first two posts with standard views, the class contains a bunch of code that lives somewhere in the class hierarchy and works behind the scenes. Our mission is now to uncover that code to figure out how exactly CBFs work and how we can change them to perform what we need.

To make the post easier to follow, please always remember that "class-based form" is a short name for "class-based form view". That is, CBFs are views, so their job is to process incoming HTTP requests and return an HTTP response. Form views do this in a slightly different way than the standard ones, mostly due to the different nature of POST requests compared with GET ones. Let us take a look at this concept before moving on.

HTTP requests: GET and POST

Please note that this is a broad subject and that the present section wants only to be a very quick review of the main concepts that are related to Django CBFs

HTTP requests come in different forms, depending on the method they carry. Those methods are called HTTP verbs and the two most used ones are GET and POST. The GET method tells the server that the client wants to retrieve a resource (the one connected with the relative URL) and shall have no side effects (such as changing the resource). The POST method is used to send some data to the server, the given URL being the resource that shall handle the data.

As you can see, the definition of POST is very broad: the server accepts the incoming data and is allowed to perform any type of action with it, such as creating a new entity, editing or deleting one or more of them, and so on.

Keep in mind that forms are not the same thing as POST request. As a matter of fact, they are connected just incidentally: a form is a way to collect data from a user browsing a HTML page, while POST requests are the way that data is transmitted to the server. You do not need to have a form to make a POST request, you just need some data to send. HTML forms are just a useful way to send POST requests, but not the only one.

Form views

Why are form views different from standard views? The answer can be found looking at the flow of a typical data submission on a Web site:

The user browses a web page (GET)
The server answers the GET request with a page containing a form
The user fills the form and submits it (POST)
The server receives and processes data

As you can see the procedure involves a double interaction with the server: the first request GETs the page, the second POSTs the data. So you need to build a view that answers the GET request and a view that answers the POST one.

Since most of the time the URL we use to POST data is the same URL we used to GET the page, we need to build a view that accepts both methods. It is time to dig into the class-based forms that Django provides to understand how they deal with this double interaction.

Let us start with the CreateView class we used in our simple example (CODE. It is an almost empty class that inherits from SingleObjectTemplateResponseMixin and BaseCreateView. The first class deals with the template selected to render the response and we can leave it aside for the moment. The second class (CODE), on the other hand, is the one we are interested in now, as it implements two methods which names are self explaining, get and post.

Processing GET and POST requests

We already met the get method in the previous article when we talked about the dispatch method of the View class. A quick recap of its purpose: this method is uses to process an incoming HTTP request, and is called when the HTTP method is GET. Unsurprisingly, the post method is called when the incoming request is a POST one. The two methods are already defined by an ancestor of the BaseCreateView class, namely ProcessFormView (CODE), so it is useful to have a look at the source code of this last class:

class ProcessFormView(View):
    """Render a form on GET and processes it on POST."""
    def get(self, request, *args, **kwargs):
        """Handle GET requests: instantiate a blank version of the form."""
        return self.render_to_response(self.get_context_data())

    def post(self, request, *args, **kwargs):
        """
        Handle POST requests: instantiate a form instance with the passed
        POST variables and then check if it's valid.
        """
        form = self.get_form()
        if form.is_valid():
            return self.form_valid(form)
        else:
            return self.form_invalid(form)

As you can see the two methods are pretty straightforward, but it's clear that a lot is going on under the hood.

The form workflow

Let's start with get, which apparently doesn't do much. It just calls render_to_response passing the result of get_context_data, so we need to track the latter to see what the template will get. ProcessFormView or its ancestors don't provide any method called get_context_data; instead, the BaseCreateView class receives it from ModelFormMixin, which in turn receives it from FormMixin (CODE).

The class hierarchy is pretty complex, but don't be scared, the important part is that the method get_context_data provided by FormMixin injects a 'form' value into the context (CODE), and the form is provided by the get_form method defined in the same class (CODE), and this eventually uses the form_class attribute to instantiate the form (CODE). As you can see there are plenty of steps, which means plenty of chances to customise the behaviour, if we should need to provide a personalised solution.

It is interesting to have a even more in-depth look at the form creating mechanism, though, as this is the crucial point of the whole GET/POST difference. Once the method get_form retrieved the form class, it instantiates it to create the form itself, and the parameters passed to the class are provided by get_form_kwargs (CODE). When the HTTP method is GET, get_form_kwargs returns a dictionary with the initial and prefix keys, which are taken from the attributes with the same names. I don't want to dig too much into forms now, as they are out of the scope of the post, but if you read the definition of BaseForm (CODE) you will notice that its __init__ method accepts the same two attributes inital and prefix. Pay attention that this is a simplification of the whole process, as the ModelFormMixin class injects a slightly more complicated version of both got_form_class and get_form_kwargs to provide naming conventions related to the Django model in use.

Back to ProcessFormView, the post method does not directly render the template since it has to process incoming data before doing that last step. The method, thus, calls get_form directly and then runs the validation process on it, calling then either form_valid or form_invalid, depending on the result of the test. See the official documentation for more information about form validation.

This time, get_form_kwargs adds two keys to the form when it is instantiated, namely data and files. These come directly from the POST and FILES attributes of the request, and contain the data the user is sending to the server.

Last, let's have a look at form_valid and form_invalid. Both methods are provided by FormMixin (CODE), but the former is augmented by ModelFormMixin (CODE). The base version of form_invalid calls render_to_response passing the context data initialised with the form itself. This way it is possible to fill the template with the form values and error messages for the wrong ones, while form_valid, in its base form, just returns an HttpResponseRedirect to the success_url. As I said, form_valid is overridden by ModelFormMixin, which first saves the form, and then calls the base version of the method.

Let's recap the process until here.

The URL dispatcher requests a page containing a form with GET.
The get method of ProcessFormView finds the form class of choice through get_form_class
The form class is instantiated by get_form with the values contained in the self.initial dictionary
At this point a template is rendered with a context returned by get_context_data as usual. The context contains the form.
When the use submits the form the URL dispatcher requests the page with a POST that contains the data
The post method of ProcessFormView validates the form and acts accordingly, rendering the page again if the data is invalid or processing it and rendering a success template with the newly created object.

Update and Delete operations

This rather rich code tour unveiled the inner mechanism of the CreateView class, which can be used to create a new object in the database. The UpdateView and DeleteView classes follow a similar path, with minor changes to perform the different action they are implementing.

UpdateView wants to show the form already filled with values, so it instantiates an object before processing the request (CODE). This makes the object available in the keywords dictionary under the instance key (CODE), which is used by model forms to initialize the data (CODE). The save method of BaseModelForm is smart enough to understand if the object has been created or just changed (CODE so the post method of UpdateView works just like the one of CreateView.

DeleteView is a bit different from CreateView and UpdateView. As the official documentation states, if called with a GET method it shows a confirmation page that POSTs to the same URL. So, as for the GET requests, DeleteView just uses the get method defined by its ancestor BaseDetailView (CODE), which renders the template putting the object in the context. When called with a POST request, the view uses the post method defined by DeletionMixin (CODE, which just calls the delete method of the same class (CODE). This performs the deletion on the database and redirects to the success URL.

Final words

As you can see, the structure behind the current implementation of Django class-based form views is rather complex. This allows the user to achieve complex behaviours like the CUD operations just by defining a couple of classes as I did in the simple example at the beginning of the post. Most of the time, however, such a simplification makes it difficult for the programmer to understand how to achieve the desired changes to the class behaviour. So, the purpose of this big tour I made inside the Django source code was to give an insight of what methods are called in the life cycle of your HTTP request so that you can better identify what methods you need to override.

When performing special actions that fall outside the standard CUD operations you better inherit from FormView (CODE). The first thing to do is to check if and how you need to customize the get and post methods; remember that you either need to implement the full behaviour of those methods or make you changes and call the parent implementation. If this is not enough for your application consider overriding one of the more dedicated methods, such as get_form_kwargs or form_valid.

Read more posts like this on The Digital Cat

Digging up Django class-based views - 2

Leonardo Giordani — Tue, 17 Mar 2020 07:58:59 +0000

This post was originally posted on The Digital Cat

In the first instalment of this short series, I introduced the theory behind Django class-based views and the reason why in this context classes are more powerful than pure functions. I also introduced one of the generic views Django provides out of the box, which is ListView.

In this second post I want to talk about the second most used generic view, DetailView, and about custom querysets and arguments. Last, I'm going to introduce unspecialised class-based views that allow you to build more complex Web pages. To fully understand DetailView, however, you need to grasp two essential concepts, namely querysets and view parameters. So I'm sorry for the learn-by-doing readers, but this time too I'm going to start with some pure programming topics.

QuerySets or the art of extracting information

One of the most important parts of Django is the ORM (Object Relational Mapper), which allows you to access the underlying database just like a collection of Python objects. As you know, Django provides tools to simplify the construction of DB queries; they are managers (the .objects attribute of any models, for example) and query methods (get, filter, and so on). Pay attention because things here are slightly more complicated than you can think at a first glance.

When you use one of the methods of a manager you get as a result a QuerySet, which most of the time is used as a list, but is more than this. You can find the documentation about queries here and the documentation about QuerySet here. Both are very recommended readings.

What I want to stress here is that quesysets are not evaluated until you perform an action that access the content like slicing or iterating on it. This means that we can build querysets, pass them to functions, store them, and even build them programmatically or metaprogramming them without the DB being hit. If you think at querysets as recipes you are not far from the truth: they are objects that store how you want to retrieve the data of your interest. Actually retrieving them is another part of the game. This separation between the definition of something and its execution is called lazy evaluation.

Let me give you a very trivial example to show why the lazy evaluation of querysets is important.

[...]

def get_oldest_three(queryset):
    return queryset.order_by['id'][0:2]

old_books = get_oldest_three(Book.objects.all())
old_hardcover_books = \
    get_oldest_three(Book.objects.filter('type=Book.HARDCOVER'))

As you can see the get_oldest_three method is just filtering an incoming QuerySet (which can be of any type); it simply orders the objects and gets the first three inserted in the DB. The important thing here is that we are using querysets like pure algorithms, or descriptions of a procedure. When creating the old_books variable we are just telling the get_oldest_three method "Hey, this is the way I extract the data I'm interested in. May you please refine it and return the actual data?"

Being such flexible objects, querysets are an important part of generic views, so keep them warm for the upcoming banquet.

Being flexible: parametric views

URLs are the API of our Web site or service. This can be more or less evident for the user that browses through the pages, but from the programmer's point of view, URLs are the entry points of a Web-based service. As such, they are not very different from the API of a library: here, static pages are just like constants, or functions that always return that same value (such as a configuration parameter), while dynamic pages are like functions that process incoming data (parameters) and return a result.

So URLs can accept parameters, and our underlying view shall do the same. You basically have two methods to convey parameters from the browser to your server using HTTP. The first method is named query string and lists parameters directly in the URL through a universal syntax. The second method is storing parameters in the HTTP request body, which is what POST requests do. We will discuss this method in a later post about forms.

The first method has one big drawback: most of the time URLs are long (and sometimes too long), and difficult to use as a real API. To soften this effect the concept of clean URL arose, and this is the way Django follows natively (though, if you want, you can also stick to the query string method).

Now, the Django official documentation on URL dispatcher tells you how you can collect parameters contained in the URL parsing it with a regular expression; what we need to discover is how class-based views receive and process them.

In the previous post we already discussed the as_view method that shall instance the class and return the result of dispatch (CODE).

    def as_view(cls, **initkwargs):
        """
        Main entry point for a request-response process.
        """
        # sanitize keyword arguments
        for key in initkwargs:
            if key in cls.http_method_names:
                raise TypeError("You tried to pass in the %s method name as a "
                                "keyword argument to %s(). Don't do that."
                                % (key, cls.__name__))
            if not hasattr(cls, key):
                raise TypeError("%s() received an invalid keyword %r. as_view "
                                "only accepts arguments that are already "
                                "attributes of the class." % (cls.__name__, key))

        def view(request, *args, **kwargs):
            self = cls(**initkwargs)
            if hasattr(self, 'get') and not hasattr(self, 'head'):
                self.head = self.get
            self.request = request
            self.args = args
            self.kwargs = kwargs
            return self.dispatch(request, *args, **kwargs)

        # take name and docstring from class
        update_wrapper(view, cls, updated=())

        # and possible attributes set by decorators
        # like csrf_exempt from dispatch
        update_wrapper(view, cls.dispatch, assigned=())
        return view

Now look at what the view wrapper function actually does with the instanced class (CODE); not surprisingly it takes the request, args and kwargs passed by the URLconf and converts them into as many class attributes with the same names. Remember that URLconf is given this function itself, not the result of the call, which is the result of dispatch.

This means that anywhere in our CBVs we can access the original call parameters simply reading request, args and kwargs, where *args and **kwargs are the unnamed and named values extracted by the URLconf regular expression.

Getting details

Just after listing things, one of the most useful things a Web site does is giving details about objects. Obviously any e-commerce site is made for the most part by pages that list products and show product details, but also a blog is made of one or more pages with a list of posts and a page for each of them. So building a detailed view of the content of our database is worth learning.

To help us in this task Django provides DetailView, which indeed deals, as the name suggests, with the details of what we get from the DB. While ListView's basic behaviour is to extract the list of all objects with a given model, DetailView extracts a single object. How does it know what object shall be extracted?

When dispatch is called on an incoming HTTP request the only thing it does is to look at the method attribute, which for HttpRequest objects contains the name of the HTTP verb used (e.g. 'GET'); then dispatch looks for a method of the class with the lowercase name of the verb (e.g. 'GET' becomes get) (CODE). This handler is then called with the same parameters of dispatch, namely the request itself, *args and **kwargs (CODE).

DetailView has no body and inherits everything from two classes, just like happened for ListView; the first parent class is the template mixin, while the second one, BaseDetailView, implements the get method (CODE).

    def get(self, request, *args, **kwargs):
        self.object = self.get_object()
        context = self.get_context_data(object=self.object)
        return self.render_to_response(context)

As you can see, this method extracts the single object that it shall represent calling get_object, then calls get_context_data (that we already met in the previous post) and last the familiar render_to_response. The method get_object is provided by BaseDetailView's ancestor SingleObjectMixin (CODE): the most important parts of its code, for the sake of our present topic are

def get_object(self, queryset=None):

    [...]

    if queryset is None:
        queryset = self.get_queryset()

    pk = self.kwargs.get(self.pk_url_kwarg, None)

    [...]

    if pk is not None:
        queryset = queryset.filter(pk=pk)

    [...]

    try:
        obj = queryset.get()

    [...]

    return obj

Warning: I removed many lines from the previous function to improve readability; please check the original source code for the complete implementation.

The code shows where DetailView gets the queryset from; the get_queryset method is provided by SingleObjectMixin itself and basically returns queryset if present, otherwise returns all objects of the given model (acting just like ListView does). This queryset is then refined by a filter and last by a get. Here get is not used directly (I think) to manage the different error cases and raise the correct exceptions.

The parameter pk used in filter comes directly from kwargs, so it is taken directly from the URL. Since this is a core concept of views in general I want to look at this part with some extra care.

The DetailView class is called by an URLconf that provides a regular expression to parse the URL, for example url(r'^(?P<pk>\d+)/$',. This regex extracts a parameter and gives it the name pk, so kwargs of the view will contain pk as key and the actual number in the URL as value. For example the URL 123/ will result in {'pk': 123}. The default behaviour of DetailView is to look for a pk key and use it to perform the filtering of the queryset, since pk_url_kwarg is 'pk' (CODE).

So if we want to change the name of the parameter we can simply define the pk_url_kwarg of our class and provide a regex that extract the primary key with the new name. For example url(r'^(?P<key>\d+)/$', extracts it with the name key, so we should define pk_url_kwarg = 'key' in our class.

From this quick exploration we learned that a class inheriting from DetailView:

provides a context with the object key initialized to a single object
must be configured with a model class attribute, to know what objects to extract
can be configured with a queryset class attribute, to refine the set of objects where the single object is extracted from
must be called from a URL that includes a regexp that extracts the primary key of the searched object as pk
can be configured to use a different name for the primary key through the pk_url_kwarg class attribute

The basic use of DetailView is exemplified by the following code.

class BookDetail(DetailView):
    model = Book

urlpatterns = patterns('',
    url(r'^(?P<pk>\d+)/$',
        BookDetail.as_view(),
        name='detail'),
    )

The view extracts a single object with the Book model; the regex is configured with the standard pk name.

As shown for ListView in the previous post, any CBV uses get_context_data to return the context dictionary to the rendering engine. So views that inherit from DetailView can add data to the context following the same pattern. Suppose we have a function get_similar_books that given a book, returns similar ones according to some criteria.

class BookDetail(DetailView):
    model = Book

    def get_context_data(self, **kwargs):
        context = super(BookDetail, self).get_context_data(**kwargs)
        context['similar'] = get_similar_books(self.object)
        return context

urlpatterns = patterns('',
    url(r'^(?P<pk>\d+)/$',
        BookDetail.as_view(),
        name='detail'),
    )

As explained before, you can access the object being shown through object, which in the above example is passed to a service function we implemented somewhere in our code.

Using the base views

Sometimes, when dealing with complex pages, the generic display CBVs that Django provides are not the right choice. This usually becomes evident when you start overriding method to prevent the view to perform its standard behaviour. As an instance say that you want to show detailed information of more than one object: probably DetailView will soon show its limits, having been built to show only one object.

In all those cases that cannot be easily solved by one of the generic display CBVs, your have to build your own starting from one of the base views: RedirectView, TemplateView, or View (DOCS, CODE).

I'm not going to fully describe those views; I want however to briefly point out some peculiarities.

View is by now an old friend of us; we met it when we discussed the as_view and dispatch methods. It is the most generic view class and can be leveraged to perform very specialized tasks such as rendering pages without templates (for example when returning JSON data).

TemplateView is the best choice to render pages from a template, maintaining a great level of freedom when it comes to the content of the context dictionary. Chances are that this is going to be the view you will use the most after ListView and DetailView. Basically you just need to inherit from it and define the get_context_data method. As you can see from the CODE TemplateView answers to GET requests only.

RedirectView, as the name implies, is used to redirect a request. The redirection mechanism is very simple: its get method returns a HttpResponseRedirect to the URL defined by the url class attribute. The class exhibits a very interesting behaviour (CODE) when called with HTTP methods other than GET (namely HEAD, POST, OPTIONS, DELETE, PUT, or PATCH): it "converts" the method to GET simply calling get from the respective method (head, post, and so on). In the next post I'll show how to leverage this simple technique to show the user a pre-filled form.

Date-based views

Django provides other class-based views that simplify dealing with objects extracted or ordered by date. As a programmer, you know that sometimes dealing with dates is awkward, to say the least; views such as YearArchiveView or DayArchiveView (CODE) aim to help you to tame your date-based objects; any object that contains a date (e.g. post date for articles, birth date for people, log date for messages, etc.) can be processed by these views. You can find the official documentation here.

Remember that date-based views are CBVs, so they are based on View, just like ListView or TemplateView. So, apart from their specialization on date processing, they behave the same (using get_context_data, get, dispatch, and so on).

Final words

In this post we covered DetailView in deep and, more superficially, all the remaining base and data-based views. I showed you how DetailView uses the given model and the querystring parameters to find the requested object, and how you can change its default behaviour. In the next post we will step into the rich (and strange) world of forms.

Read more posts like this on The Digital Cat

Digging up Django class-based views - 1

Leonardo Giordani — Mon, 16 Mar 2020 09:37:44 +0000

This post was originally posted on The Digital Cat

Django 3 was released at the end of 2019, so I think it is high time I revisited my successful series of post about class-based views in Django. Those posts date back to 2013 and have been written with Django 1.5 in mind, and with examples from that code base. Now Django is already two versions older, but class-based views are still a big part of the framework, so I believe it makes sense to refresh the content of those posts. Moreover, I didn't have a chance to study Django 3 yet, so as per tradition of this blog, I will make my personal investigation available to everyone.

If you are a novice Python programmer, and just approached Django to start your career in web development, chances are that you were puzzled by many things, and class-based views (CBVs) are definitely among those. CBVs are apparently very easy to use, in the simple cases, but it might not be clear how to extend them to match more complicated use cases, as the development of a project proceeds. The official documentation is very good, but to master CBVs you need to understand object-oriented concepts like classes (well, obviously), delegation, and method overriding.

If you need to brush up on these concepts you might find useful to read the following posts on my blog:

What are CBVs?

Class-based views are, Django views based on Python classes. This means that, to master them, you need to understand both Django views and Python classes, so let's give a quick definition of them.

A Django view is a piece of code that processes an incoming HTTP request and returns an HTTP response, nothing more, nothing less. A Python class is the implementation of the Object-Oriented concept of class in the Python language.

So, a view needs to be a callable, and this includes functions and classes. Thus, to understand the advantages of class-based views over function-based views we shall discuss the merits of classes over functions. The latter sentence could be the title of a 10 volumes book on programming (followed by another 10 volumes book titled "Merits of functions over classes"), so I am just going to scratch the surface of the matter here. If you want to dig more into the subject, please read the series on Python 3 OOP that I linked above, where you will find all the gory details that you are craving for.

Starting off with Python classes

The main point classes is to implement encapsulation: they represent a way of coupling data and functions. Doing this, a class loses the dynamic essence of a procedure, which exists only while it is running, and becomes a living entity, something that sits there, caring for its data, and reacts when we call its functions (methods).

A good analogy for a class is a finite-state machine: once the class has been initialized, methods are what we use to make the machine move between states. If we do not call methods, the class simply waits there without complaining.

As an example, let's look at a very simple procedure that extracts the even numbers from an iterable like a list

def extract_even_numbers(alist):
    return [i for i in alist if i%2 == 0]

The example is very trivial, but, as code naturally tends to become more complicated, it's better to start with simple examples. A class version of this function could be written as

class EvenExtractor:
    def __init__(self, alist):
        self.l = alist

    def extract(self):
        return [i for i in self.l if i%2 == 0]

The two are very similar, and it might look like we haven't changed anything. Indeed, the difference is subtle but remarkable. Now the EvenExtractor class has two parts, the first being the initialization and the second being the actual extraction, and we can have the class in one of three states: before initialization (EvenExtractor), after initialization (e = EvenExtractor([1,4,5,7,12])), and after extraction (l = e.extract()).

Converting the procedure to a class, then, we obtained a rich tool that can execute its job step by step and, in general, can work in a non linear way, as we might add further methods, and thus more states.

Delegation is the key

The real power of classes used as finite-state machines lies in the concept of delegation. This is a mechanism through which a class can delegate some work to another class, avoiding to duplicate code, and thus favouring code reuse and generalisation.

(You might notice that I don't mention inheritance, but delegation, which is implemented by both composition and inheritance. I am a strong supporter of an OO design principle that states "Favour composition over inheritance". I keep reading too many introductions to object-oriented that stress too much the inheritance mechanism and leave composition aside, raising a generation of OOP programmers that, instead of building systems populated by many small collaborating objects, create nightmares infested by giant all-purpose things that sometimes resemble more an operating system than a system component.)

Let's continue the above example, improving the __init__ method of the EvenExtractor class:

class EvenExtractor:
    def __init__(self, alist):
        self.l = [int(elem) for elem in alist]

    def extract(self):
        return [i for i in self.l if i%2 == 0]

Now the class performs an important action in its initialization phase, converting all elements of the input to integers. Some days after this change, however, we might realise that we could also profitably use a class that extracts odd elements from a list. Being responsible object oriented programmers we write

class OddExtractor(EvenExtractor):
    def extract(self):
        return [i for i in self.l if i%2 != 0]

and call it a day. Through the inheritance mechanism expressed by that (EvenExtractor) signature of the new class, we first defined something that is exactly the same thing as EvenExtractor, with the same methods and attributes, but with a different name. Then we changed the behaviour of the new class but only for the extraction part by overriding the method.

To summarise the lesson: using classes and delegation you can build finite-state machines that are easily customizable to suit your exact needs. This obviously is just one of the many points of view under which you can consider classes, but it is the one we need to understand Django CBVs.

Back to Django

Let's start discussing a practical use of what we learned so far, reviewing how Django uses Python classes and delegation to provide views.

A Django view is a perfect example of a finite-state machine. It takes an incoming request and makes it flow through different processing steps until a final response is produced, which is then sent back to the user. CBVs are a way for the programmer to write their views leveraging the object-oriented paradigm. In this context Class-based Generic Views are the "batteries included" of Django views, the building blocks that the framework provides out of the box.

Let's dig into one of the examples of the official Django docs; here you find the API of the beloved ListView, a generic view to deal with a list of things (extracted from the database). I slightly simplified the example provided by the documentation to avoid having too much on our plate.

from django.views.generic.list import ListView

from articles.models import Article

class ArticleListView(ListView):

    model = Article

This example assumes that articles is your application and Article is one of its models.

You can see here the full power of inheritance. We just derived ArticleListView from ListView, and changed the model class attribute. How can this work? How can this class process incoming requests and what are the outputs? The official documentation states "While this view is executing, object_list will contain the list of objects (usually, but not necessarily a queryset) that the view is operating upon."; this leaves many dark corners, however, and if you are a novice, chances are that you are already lost.

Since ArticleListView derives from ListView, the latter is the class we have to analyse to understand how incoming data is processed. To do this you need to look at the documentation, and if something is still unclear you can freely look at the source code. In the following paragraphs I will summarise what happens when Django calls the sample ArticleListView class shown above, and you will find links called "DOCS" for the official documentation, and "CODE" for the relevant source code, if you want to read it by yourself.

URL dispatchers and views

A CBV cannot directly be used in your URL dispatcher; instead you have to give the result of the as_view method (CODE), which defines a function that instances the class (CODE) and calls the dispatch method (CODE); then the function is returned (CODE) to be used in the URL dispatcher. As a user, we are interested only in the fact that the entry point of the class (the method called when a request hits the URL linked with it) is dispatch.

Let's use this knowledge to print out a string on the console each time a request is served by our CBV. I will run through this simple task step by step, since it shows exactly how you have to deal with CBVs when solving real problems.

If we define the ArticleListView class this way

from django.views.generic.list import ListView

from articles.models import Article

class ArticleListView(ListView):

    model = Article

    def dispatch(self, request, *args, **kwargs):
        return super().dispatch(request, *args, **kwargs)

the class does not change its behaviour. What we did was to override the dispatch method with a call to the parent's method, i.e. we explicitly wrote what Python does by default. You can find detailed information about super in the official documentation and in this post on the blog. Please be sure you understand the star and double star notation to define variable number of arguments; the official documentation is here.

Since views are automatically called by the framework, the latter expects them to comply with a very specific API, so when overriding a method you have to provide the same signature of the original one. The signature of dispatch can be found here.

The dispatch method receives a request argument, which type is HttpRequest (documentation), so we can print it on the console with the standard print function

from django.views.generic.list import ListView

from articles.models import Article

class ArticleListView(ListView):

    model = Article

    def dispatch(self, request, *args, **kwargs):
        print(request)
        return super().dispatch(request, *args, **kwargs)

This prints the content of the request object on the standard output of the server that is running the Django project. If you are running the Django development server, you will find the output on the text console where you issued the command python manage.py runserver.

This, in a nutshell, is the standard way of dealing with Django CBGVs: inherit from a predefined class, identify which methods you need to change, override them complying with their signature and calling the parent's code somewhere in the new code.

The full list of methods ListView uses when processing incoming requests is listed on its official documentation page in the "Method Flowchart" section; in the "Ancestors (MRO)" section you can see that ListView inherits from a good number of other classes. MRO stands for Method Resolution Order and has to deal with multiple inheritance: if you are eager to deal with one of the most intricate Python topics feel free to read this.

Incoming GET requests

Back to our ArticleListView. The dispatch method of the parent reads the method attribute of the request object and selects a handler to process the request itself (CODE): this means that if request.method is 'GET', which is the HTTP way to say that we are reading a resource, dispatch will call the get method of the class.

The get method of ListView comes from its BaseListView ancestor (DOCS, CODE). As you can see, the function basically initializes the attribute object_list with the result of the call get_queryset(), creates a context calling the method get_context_data and calls render_to_response.

Are you still with me? Don't give up, we are almost done, at least with ListView. The method get_queryset comes from the MultipleObjectMixin ancestor of ListView (DOCS, CODE) and simply gets all objects of a given model (CODE) running queryset = self.model._default_manager.all(). The value of model is what we configured in our class when we wrote model = Article. I hope at this point something start to make sense in your head.

That's all, actually. Our ArticleListView class extracts all Article objects from the database, and calls a template passing a context that contains a single variable, object_list, instanced with the list of extracted objects.

Templates and contexts

Are you satisfied? I'm actually still curious about the template and the context. Let's see what we can find about these topics. First of all, when the class calls render_to_response it uses the code that comes from its TemplateResponseMixin ancestor (DOCS, CODE); the method initialises the class TemplateResponse passing a template and a context. The template, through a series of calls which you can follow by yourself, comes from template_name (CODE); while TemplateResponseMixin initializes it as None (CODE), ListView performs some magic tricks through ancestors (CODE) to return a template which name derives from the given model. In short, our ArticleListView, defining an Article model, automatically uses a template that is called article_list.html.

May we change this behaviour? Of course! This is, after all, the point of using classes instead of functions: easily customisable behaviour. We can change the definition of our class to be

from django.views.generic.list import ListView

from articles.models import Article

class ArticleListView(ListView):

    model = Article
    template_name = 'sometemplate.html'

Let's review what this does step by step. When the response is created, Django runs the code of render_to_response (CODE), which in turn calls get_template_names. Pay attention that this method returns a list of names, as Django will use the first available among them, scanning them in order. This method is overridden in ListView by its superclass MultipleObjectTemplateResponseMixin (CODE). This calls the same method of its own superclass TemplateResponseMixin (CODE), which returns the attribute we set in the ArticleListView class (CODE). The mixing goes on and appends to the list the template file name derived from the model (CODE) and finally returns the list, which at this point is ['sometemplate.html', 'article_list.html'].

As for the context, remember that it is only a dictionary of values you want to be able to access when compiling the template. Variable names inside the context, data format, and data content are completely up to you. When using CBGVs, however, you will find in your context some variables that have been created by the ancestors of your view, as happens for object_list. What if you want to show a page with the list of all articles, but you want to add a value to the context?

Easy task: you just need to override the function that produces the context and change its behaviour. Say, for example, that we want to show the number of total readers of our site, along with the list of articles. Assuming that a Reader model is available we can write

from django.views.generic.list import ListView

from articles.models import Article, Reader

class ArticleListView(ListView):
    model = Article

    def get_context_data(self, **kwargs):
        context = super().get_context_data(**kwargs)
        context['readers'] = Reader.objects.count()
        return context

As always, when overriding a method we need to ask ourselves if we need to call the original method. In this case, we want to merely augment the content of the context and not replace it, so we call super().get_context_data(**kwargs) first, and we add the value that we need to that. pay attention that this might not be always the case, as it depends on the logic of your override.

Final words

In this first post I tried to uncover some of the mysteries behind CBVs and CBGVs in Django, showing exactly what happens to a GET request that hits a class-based view. Hopefully the matter has now been demystified a little! In the next posts I will discuss DetailView, the generic view to show detail about an object, how to create custom CBVs, and how to use CBVs to process forms, i.e. accept POST requests.

Read more posts like this on The Digital Cat