Abhiyan Dhakal

Posted on Aug 28

Docker Explained: How It Really Works Under the Hood

#cloud #devops #docker #architecture

Docker is a well known containerization tool, which most of us have either used it or have heard of it. Even though docker is used by many and is widely known about, there seems to be a lot of misconceptions floating around regarding it.

Please note that this is not a docker tutorial, but a dive into what docker is and how it works under the hood.

Before learning more about docker itself, let's first learn what a container is.

What is a Container?

A container is an isolated environment that contains the application code, required packages, tools, configuration and other requirements so that the packaged software can be run anywhere and can be perfectly recreated.

So what's wrong with traditional methods? Due to difference in environments like operating system (Windows, Linux, MacOS, etc.), binary versions, dependencies, other external factors, the exact copy might not be recreated everywhere. For example, a C application written for Linux mightn't work on Windows. Sometimes make version might cause an issue. Linking the libraries have caused headaches for ages.

What if we can recreate the same environment in every machine? That is what the basic theory of a container is. A container creates a perfectly replicable environment where the code can be run.

Here is a meme I found in r/ProgrammerHumor:

This not only applies to docker but to every container.

The working mechanism of a container is well documented in OCI (Open Container Initiative) specification. The OCI specification defines standards for container runtimes and container images to ensure portability and interoperability across different tools and platforms. It defines three specifications:

Runtime specification
Image specification
Distribution specification

Runtime specification

Runtime specification describes how to run a container. Generally a specification file or a configuration file (like Dockerfile, config.json) is created to mention how to run a container. For example, you write a Dockerfile with details such as which image to use, which files to copy, what commands to run, which TCP port to expose (for e.g. 3000) which will dictate how the docker container will be created and run.

Image specification

A container image is a packaged piece of software that includes all the things required by a container. This specification describes how container images are built, packaged and distributed. (more about images later in context of docker).

Distribution specification

This specification describes how container images are distributed over HTTP. For instance, docker distributes its images using Docker Hub. Images need to be pushed and pulled from a repository to redistribute them. Distribution is handled from a central repository (such as Docker Hub), they are pushed into the repository by the image authors, and the users of the image pull from the repository.

Now What is Docker?

Suppose you have a medium scale website with a lot of setups, for example, a website with a frontend, a Go API backend, a scheduler in Python, a Postgres database and a headless CMS. You can see even in a simple medium scale website, there are many moving parts and many things to set up for developers. To set it up all in a developer's environment is a pain. There can be various software version issues in different peers' environment. So, docker doesn't only help create an isolated environment, but also automate the process.

You write a specification file called Dockerfile based on which docker pulls an image from Docker Hub, creates a container, performs various actions defined in the specification file. For instance, in a Debian image, you can install packages, interact with the file system like in a Debian Operating System, through which you can automate your tasks.

In docker, you can also go into an interactive mode, where you can interact with the docker container using the shell like bash and do most of the things you would in a virtual machine running Debian, including having its own network, process groups, RAM and CPU.

With the feature set available with docker, being able to essentially run a Linux operating system, you might have drawn a conclusion.

Docker is a mini virtual machine, right?

Actually, even though docker on a surface does what a virtual machine does, docker is NOT a virtual machine. So, what is it? Let's discuss.

How does Docker Work?

To understand how docker works, we first need to know how Linux (GNU/Linux) works. We have a Linux kernel that acts as a bridge between hardware and software and handles process scheduling, context switching, load balancing and so on , and there is user space, different programs, configurations, security stuff, which are all structured as files. So, in conclusion,

GNU/Linux = Linux Kernel + User space (Structured in filesystem)

Docker Image

So, a docker image is essentially just a snapshot of a lightweight Linux filesystem. Meaning, it will have the basic essential files, packages and configuration of a Linux system.

Is it fair to say that a docker image is just a zipped Linux filesystem? Well... It's a lot more nuanced than that, but yes we can say that.

The filesystem is broken down into different layers so that the same layer can be used for multiple images (achieves this using OverlayFS, more about that later). Each layer is compressed using compression tools like gzip. When you run docker pull, then the layers of the image intended to pull, if non-existent in the system, are downloaded via HTTP from Docker Hub. A JSON metadata file of an image states the order of the layers to create a complete filesystem.

In short, docker image = compressed layers + metadata.

OverlayFS

OverlayFS is a union mount filesystem implementation for Linux. You have a base layer(s) (directories) on top of which new layer is introduced, which act like a single filesystem with the help of overlayFS. The lower layers are usually read-only and upper layers are editable.

I think it's easy to understand with the help of an example. Let's create two directories layer1 (read-only) and layer2 (read-and-write). layer1 has some files like file1.txt, etc. Now in merged1 directory which is mounted using overlayFS, you can interact with file1.txt, even though layer1 is read-only, but actual change is happening inside layer2. Now think of interactions with layer1 in another image — you don't need to copy the contents of layer1 to interact with new image, a new layer layer3 can be created. You can also visualize how it works from the image below:

This helps re-utilization of layers and helps save disk space.

Docker Container

Now we have files and filesystems. We have host kernel. Now how does this all actually work cohesively?

What do we need to achieve to create a functional container? An isolated environment. This can be done through Linux Namespaces. Here is the exact extract from the manpage, which should make clear why namespaces are used.

A namespace wraps a global system resource in an abstraction that
       makes it appear to the processes within the namespace that they
       have their own isolated instance of the global resource.  Changes
       to the global resource are visible to other processes that are
       members of the namespace, but are invisible to other processes.
       One use of namespaces is to implement containers.

Now, let's go one by one to know each of the namespaces.

1. Cgroup Namespace

Control groups, usually referred to as cgroups, are a Linux kernel feature which allow processes to be organized into hierarchical groups whose usage of various types of resources (such as, CPU, memory, disk I/O, network I/O) can then be limited and monitored. If you try to look up the RAM inside a docker container, you will not be able to see the RAM of the host machine, but separate RAM is provided for the container specifically, and only that is visible.

2. User Namespace

User namespace isolates users and groups. I have my host user as abhiyan. Such a user can have the following groups: abhiyan, wheel and docker. Here, if the container gets access of wheel group, then it can enter sudo command (it grants users the root access), and create unwanted problems in the host machine. So, this namespace is extremely important to isolate the user from the host.

3. Mount Namespace

We have our docker image, let's say abhiyan-img. Now, with the help of overlayFS, we have created a filesystem. Now, we need to mount the image, or in simpler words, we need to make the current filesystem as the root directory for the container. This can be implemented with chroot utility.

Chroot makes a directory a root directory after entering it, and doesn't let user to access anything outside of it. That's why it is commonly known as chroot jail. This is exactly what docker needs for filesystem isolation.

4. Network Namespace

When you are using docker, you need to allow a port to be visible to the host machine, and map to an unallocated port in the host machine. What does this mean? Docker has its own network interface, ports through which data are accepted and sent. Containers have their own virtual network interface (e.g. eth0). It allows the containers to not conflict with the host ports.

5. PID (Process Identifier) Namespace

If you run ps aux in a Linux host machine, then you can see a number of processes. The user inside the container shouldn't get a hold of processes outside the container. So, what PID namespace does is, creates a new process tree, starting PIDs from 1 inside the container.

6. IPC (Inter-Process Communication) Namespace

In Docker, containers are isolated environments that share the host OS kernel, but each container can still interact with other containers or the host through defined communication channels, primarily through Inter-Process Communication (IPC). The IPC namespace in Docker creates a level of isolation for processes that need to communicate. Essentially, this namespace ensures that processes within a container cannot interfere with or access resources used by processes in another container (or on the host system), unless you explicitly share them.

7. UTS (Unix Time-sharing System) Namespace

It is quite simple to understand. If you have bash, then default shell will show prompt as user@hostname [/current/path]. Or you can run hostname to know the hostname. The UTS namespace helps isolate the hostname. If your hostname is myhost, then inside a docker container, you won't be seeing myhost as the hostname, but some different identifier. This is done through UTS namespace.

8. Time Namespace

It isolates time and lets containers have their own independent view of system time, different from the host's system time.

This is not all about namespace, and I'm not close to understanding all of them, but this should provide a general view of how docker creates its isolated environment. If we take a look at it, now we can see that everything that a linux machine does, is replicated in a docker container. That's where the misconception of Docker is a Lightweight VM comes into play.

Hope you got the major jist how docker works under the hood. If you have anything to correct me, or share any new information regarding this, I'm happy to hear. Also, I'd be happy to help you if you want to know anything else. Thank you for reading this!

DEV Community