Kube Explained (2 Part Series)
In our previous post, Kube Explained: Part 1, I described how the introduction of the cloud resulted in CI/CD, Microservices, and a massive amount of pressure to standardize backend infrastructure tooling.
In this post, we’ll cover the first, and most important, domino to fall in this wave of standardization: the container. For the first time, containers standardized the packaging, distribution, and lifecycle of backend services and in doing so, paved the way for container orchestration, Kubernetes, and the explosion of innovation that’s surrounded that ecosystem.
There are plenty of technical deep dives, step-by-step tutorials, and vendor pitches floating around the internet. In this post, Part 2 of our Kubernetes Explained series, I’ll instead attempt to explain the context surrounding the rise of containers – what they are, why they happened, and the key pieces of the ecosystem one must contend with.
Containers combine two key innovations:
- Light-weight virtualization. Containers implement the illusion of an isolated and dedicated Linux server when, in reality, they may be running alongside other containers on a host.
- Packaging and distribution. Containers standardize the packaging and distribution of backend software in a container image. Container images are to backend software what an app is to mobile. It’s a bucket containing everything a process needs to run (binary, configuration, data …) that's standardized and thus portable.
Typically when people speak of virtualization, they mean x86 virtual machines as popularized originally by VMware, and later cloud providers like Amazon EC2. Virtual machines virtualize the CPU, Ram, and Disk of a server, parceling these resources out amongst one or more virtual machines (VMs) and providing each of them the illusion of running on their own dedicated hardware.
Virtual machines have four key features that proved crucial in supporting their adoption, and enabling the development of the cloud:
- Programmatic Control. Virtual machines are in essence software, and thus can be controlled by software APIs. This allows fleets of VMs to be booted in minutes, instead of the months it would take to physically rack new servers.
- Multiplexing Multiple VMs can run on a single physical server. This allows large servers to be used efficiently even when individual workloads don’t need all that power.
- Heterogeneity. Virtual machines can run any operating system and any software. This greatly eased the initial adoption of virtualization as it allowed enterprise IT teams and cloud providers to support any workload their customers needed without requiring modification.
- Multi-tenancy. Virtual machines are secure enough that cloud providers can run VMs from different customers on the same physical hardware. This allows cloud providers to be efficient in their placement of customer VMs, keeping costs low and making the cloud economically viable.
The key innovation of containers was to maintain the programmatic control and multiplexing capabilities of virtual machines while relaxing the heterogeneity and multi-tenancy features. As a result, they maintained the flexibility VMs provided, with much better performance and flexibility.
Cloud providers need heterogeneity and multi-tenancy because they’re serving many customers who have different requirements, and who don’t trust one another. However, an individual software engineering organization building out their backend infrastructure doesn’t need either of these features. They don’t need multi-tenancy, because they are definitionally a single tenant. And they don’t need heterogeneity, because they can standardize on a single operating system.
Furthermore, virtual machines come with costs. They’re heavy, boot slowly, and require RAM and CPU resources for the OS. These costs are worth it if you need all of the features VMs provide, but if you don’t need heterogeneity and multi-tenancy, they’re not worth paying.
VMs work by virtualizing the CPU, Ram, and Disk of a server. Containers work by virtualizing the system call interface of an operating system (typically Linux).
Over time, and long before the introduction of Docker, the Linux Kernel added three key features:
Chroot allows the root of the filesystem (
/) for a particular process to be changed. In effect this allows one to isolate parts of the filesystem to separate processes.
- Cgroups allow the kernel to limit the CPU and memory resources available to a particular process.
- Network Namespaces allow the networking subsystem to be parceled into separate isolated namespaces each with their own interfaces, IP addresses, routing tables, firewall rules, etc. These features are the basic building blocks of Linux containers. However, they’re extremely low level and difficult to use, which limited their popularity. The key innovation of Docker was to wrap these features in a simplified abstraction, the container, and provide a high level and intuitive set of tools for manipulating them. Thus, a container is simply a chroot, cgroup, and network namespace that was allocated by a container runtime like Docker.
Just like VMs, modern container runtimes provide a programmatic API, allowing containers to be started/stopped and generally managed with software. They also allow for multiplexing, in that many containers can run on a single Linux server. However, unlike VMs, they don’t have their own operating system and thus don't support the same degree of heterogeneity. Additionally, many consider containers less secure than virtual machines (I’m personally not 100% convinced on this point, but it’s the conventional wisdom and this discussion is best left for a future post).
A second key innovation in containers, less discussed but no less important, was the standardization of packaging and distribution.
Virtual machines don’t really have a practical, universal, and shareable packaging mechanism. Of course, each hypervisor has some way of storing configuration and disk images. However, in practice, VM images are so large and inconsistent between hypervisors, that VM image sharing never took off in the way one might have expected (… vagrant being a notable exception).
Instead of shipping around VM images, the standard approach before the development of containers was to boot a clean new Linux VM and then use a tool like Chef or Puppet to install the software, libraries, configuration, and data needed on that VM. There are so many package managers, configuration managers, linux distributions, libraries, programming languages, and various other components required to make this work each of which with their own idiosyncrasies and incompatibilities, that this approach never quite worked as smoothly as one would hope.
Docker fixed the issue by standardizing the container image. A container image is simply a Linux file hierarchy, containing everything a particular container needs to run. This includes everything from core OS tools (
/bin/sh, etc), to the libraries that the software needs, the actual application code, and its configuration.
Container images are both completely standard (an image that runs in one place will run everywhere), and much smaller than virtual machine images. This makes them much easier to store and move around. The standardization of the container image for the first time, allows a developer to build a container on their laptop, and have complete confidence that it will work as expected in each step of the CI/CD process, from testing to production.
In this section I’ll briefly cover the various components of the ecosystem that’s grown up around containers. This is a massive and complex topic with many projects fulfilling various niches. My goal in this post is not to be comprehensive, but instead, sketch out the major components, and suggest good places to start for those new to the area.
A container registry is a (usually SaaS) service that stores and distributes container images. In the standard cloud-native architecture, code is packaged into a container by CI/CD and then pushed to a registry where it sits until pulled by various test or production environments that may want to run it.
Everyone running containers uses a registry, so if you’re thinking about containerization, this is a key decision you’ll have to make. There are tons of registries out there, but I’ll specifically call out two options that most teams should investigate first.
- Docker Hub is the original container registry built by Docker as the container ecosystem was bootstrapped. It works well, has a pleasant UX, and is supported by Docker.
- Cloud Provider. All of the major cloud providers have a hosted registry as one of the many services they offer. It tends to be a bit less feature-rich than the third party choices, but in many cases works just fine, and doesn’t require a relationship with a another vendor.
As mentioned, a container is a collection fo operating system features that allow the system call interface to be virtualized. Those features, however, are extremely low level and as a result software is required to wrap those features in a more usable abstraction. There are a large number of container runtimes or (runtime related projects) that perform this function – Docker, Rkt, LXD, Podman, Containerd, etc. Most of the projects are of interest to hard-core container enthusiasts and have their place. However, for most people, vanilla open-source Docker is likely the right place to start.
Once containers arrived, a secondary problem of great importance immediately revealed itself. One needs software to manage containers. I.e. to start them, stop them, upgrade them, restart them when they fail, scale them when more or needed. Software that performs these tasks is called a container orchestrator.
In the early days of containers, it looked like there would be a number of popular container orchestrators fighting for market share over the long term. In a relatively short period fo time, Docker Swarm, Mesos DCOS, Nomad (Hashicorp), ECS (Amazon), and of course Kubernetes, all appeared solving essentially this problem.
While each of these options has advantages, disadvantages, and a significant install base. In recent years it’s become clear that Kubernetes will emerge as the industry-standard container orchestrator, and overtime will run the majority of container workloads. Kubernetes isn’t perfect, but as the standard, I recommend it for new deployments.
Containers are basically light-weight virtual machines that virtualize the Linux system call interface instead of the lower-level x86 architecture. There’s really no reason not to use them, even for your monolith. They simplify the deployment process at very little cost. When evaluating containers, start with Docker, Docker Hub or your cloud provider’s registry, and Kubernetes.
In future posts I will dig into Kubernetes itself, container networking, storage, and development. Stay tuned.
By: Ethan J. Jackson