Vivian Voss

Posted on May 17 • Originally published at vivianvoss.net

Why We Containerise Everything

#containers #freebsd #architecture #devops

On Second Thought — Episode 08

A new service is started. The README is written, then the Dockerfile. Within the hour the team is discussing the registry, the orchestrator, the sidecar and the helm chart. Nobody quite remembers when this became the second decision after the first commit. The container is the unit; everything else is paperwork.

This essay is about how the second decision became the first one.

The Axiom

The reflex is universal. Every new service ships in its own container. "Where does this run?" has one answer now: in its container. The container is no longer a deployment artefact but the unit of thought. Architecture diagrams are drawn in rectangles labelled with image names. Job postings list Docker and Kubernetes alongside the programming language, as if they were peers of equal weight.

The reflex is so well-trained that the question "could this run as a single process on a host" sounds almost rude. It sounds, to a sprint-shaped team in 2026, the way "could we just use a CSV file" sounds to a database team. Technically defensible, professionally suspect.

That feeling is the topic.

The Origin

Three currents converged. None of them, on its own, demanded that everything be containerised. Their convergence did.

The Isolation Current

The isolation question was answered, rather elegantly, in 1999. Poul-Henning Kamp, working for R&D Associates in Denmark, needed safe multi-tenancy on a single FreeBSD host. The traditional Unix answer was chroot, which constrains the filesystem view but not much else: a determined process inside a chroot can still see other processes, bind privileged ports, manipulate the network stack, and find its way out through any number of documented escapes. Kamp wrote Jails: kernel-native isolation that confined not only the filesystem view but the process namespace, the network stack, the user IDs, and the system calls available. He published the paper "Jails: Confining the omnipotent root" at SANE 2000 and shipped the implementation in FreeBSD 4.0 in March of the same year.

The design has aged extraordinarily well. No daemon, no image layers, no registry, no orchestrator. The kernel handles isolation directly. A jail starts in milliseconds because it is, fundamentally, a process. A FreeBSD host can comfortably run a thousand jails on four gigabytes of RAM, because the additional memory per jail is essentially zero: they share the base system.

The pattern was reproduced, less coherently. Solaris Zones (2004) followed similar principles, with a more elaborate administrative wrapping. Linux gained namespaces and cgroups between 2006 and 2008, but the design choice was the opposite of Jails: rather than a single coherent abstraction, Linux split isolation into eight separate namespace types (mount, UTS, IPC, PID, network, user, cgroup, time), each introduced in a different kernel release, each requiring the caller to compose them correctly. The user namespace in particular has produced a long sequence of CVEs, several of them giving container escape to unprivileged users, because the abstraction crosses the kernel privilege boundary in ways the original FreeBSD design carefully avoided. LXC (2008) composed the namespaces into something resembling Jails; Docker (2013) wrapped LXC, then replaced it with libcontainer, and gave the resulting pattern a brand, a CLI, an image format, a registry, and the marketing budget to make all of it the new normal. The technique was twenty-five years old by the time the world learnt to spell its name, and considerably more careful in the original.

This is the first current. The Jails version of it would have produced a useful tool. The Linux version of it, less coherent, more error-prone, dependent on a daemon, would have produced something more or less the same, given enough time to absorb the operational lessons. Neither alone would have produced "containerise everything".

The Org-Shape Current

The second current came from a different room entirely. Scrum was codified between 1995 (Sutherland and Schwaber's OOPSLA paper) and 2001 (the Agile Manifesto). It split the organisation into sprint-shaped teams: own backlog, own velocity, own deploy. The unit of organisational existence became the team that fits in a standup.

Melvin Conway, in 1968, had described the consequence in advance: any organisation that designs a system will produce a design whose structure is a copy of the organisation's communication structure. Sprint-shaped teams want service-shaped architecture. Each team needs its own deployable, its own release cadence, its own runtime, its own database, because anything shared becomes a synchronisation point and synchronisation points slow down sprints.

The container became the natural envelope. One service per team. One team per container. The Dockerfile became the contract between team and platform. Conway's law, originally a description of how organisations leak into their architecture, became a contract: the architecture is the org chart, rendered in YAML.

This is the second current. It does not require containers, technically. It requires a deployment boundary that matches the team boundary. Containers happened to be the available boundary.

The Runtime Current

The third current came from a third room. Node.js was released in 2009 by Ryan Dahl. The runtime was single-threaded by design: one event loop, asynchronous I/O, no shared-memory concurrency. For the use cases Dahl had in mind, that was a feature: server-side I/O without the threading complexity of Apache or Java.

Multi-core hardware did not fit the runtime. A box with eight cores running Node.js could, at best, see one of them. The fix was not to redesign the runtime. The fix was to start more processes. Cluster mode in Node.js, then process supervisors, then containers, then container orchestrators. The entire cloud-native scheduling stack absorbed, in part, a workaround for a runtime that did not know what to do with a second core.

The same logic applied to Python (the GIL), to Ruby (likewise), and to a generation of interpreted languages whose concurrency story ended at the process boundary. The container became the only practical unit of multi-core deployment for languages designed before multi-core was assumed.

This is the third current. It does not, on its own, require containerisation. It requires multiple processes. Containers happened to be how teams already did that.

The Convergence

Three currents, one architecture. Each could have produced something modest on its own: a useful isolation tool, a team-deployment boundary, a multi-process runtime strategy. Together, they produced the default that "everything ships in a container" and the secondary default that "everything needs an orchestrator to schedule the containers". By the time anyone thought to question the convergence, the convergence had become the architecture, the curriculum, the conference circuit, and the hiring funnel.

The honest description of what the three currents produced, taken together, is a stack of workarounds presented as architecture. The isolation current produced a useful primitive, then Linux reproduced it less carefully, then Docker wrapped the result in a daemon nobody asked for. The org-shape current produced a deployment-boundary requirement that pre-existing tooling happened to satisfy, badly. The runtime current produced a multi-process strategy because the runtime could not be fixed in the place that needed fixing. Each layer answers the layer beneath it, and each layer needs the next layer to answer back. None of the three currents, alone, demanded a stack of any height. Together, they make a workaround feel like architecture.

The Cost

The bill arrives in four layers, each justified by the previous.

Image Bloat

A minimal Node.js container image, based on Alpine Linux, is around 150 megabytes. The convenient default, node:22 without the -alpine qualifier, exceeds a gigabyte. Most of the gigabyte is build tools the application will not need at runtime: compilers, package managers, debug symbols, full GNU userland. Every unused package is, in security terms, an unpatched CVE waiting for someone to notice.

The application itself is a footnote in its own deployment. A 50-megabyte Node.js service ships in a 1-gigabyte image, which is then pushed to a registry, pulled to nodes, cached, layered, garbage-collected and replicated. The application is one to five per cent of the bytes moved.

The Supply Chain

The gigabyte travels with its npm tree. Every container that runs Node.js carries hundreds of transitive dependencies that the runtime, by design, loads at full privilege at start-up. The container's promise of reproducibility delivers, with the same fidelity, a reproducible attack surface.

In spring 2026 this stopped being theoretical. A six-week wave of supply-chain compromises ran through npm: the axios incident in March, attributed to Sapphire Sleet (a North Korean cluster) at roughly 100 million weekly downloads carrying a RAT through a plain-crypto-js postinstall; the SAP-namespace mini-Shai-Hulud incident in late April; the TanStack mini-Shai-Hulud in mid-May, reaching 84 versions in the first wave and spreading within 48 hours to 172 packages and 403 versions across npm and PyPI, totalling around 518 million cumulative downloads. In May, vx-underground reported the full Shai-Hulud source code in public circulation, lowering the cost of the next wave to the cost of one fork.

Every container pulled in that window inherited the blast radius. The Dockerfile's FROM node:22 is an instruction to absorb whichever supply-chain state happens to be current at build time. The reproducibility model contains no signal for whether that state is healthy. A Helm chart applied in good faith on a Wednesday morning in May 2026 could, in principle, ship a credential-stealer into production by Wednesday afternoon, and the deployment record would describe the operation as nominal.

The image is not the problem on its own; the image plus a runtime that loads its dependencies at start-up plus a public registry with no enforced provenance is the problem. The container's contribution is to make the problem deployable.

Daemon Overhead

Docker requires a persistent daemon, dockerd, which mediates between the container CLI and the kernel isolation primitives. The daemon is not free. A documented case on Docker's own forums shows dockerd consuming over five gigabytes of virtual memory while supervising 183 containers. The runtime that supervises the containers competes with the containers for the same memory budget.

The deeper architectural point is that the daemon is structurally unnecessary. FreeBSD Jails do not have a daemon. The kernel handles isolation directly; the CLI talks to the kernel; there is nothing in between to keep alive. Linux namespaces likewise do not strictly require a daemon: Podman demonstrated that a daemonless, rootless container runtime is possible while remaining Docker-compatible. The daemon is a historical accident of how Docker happened to be built, not a requirement of the isolation it provides.

Network Tax

A function call within a single process costs roughly one microsecond. A network call between two services in the same datacentre, even on a fast network, costs one to five milliseconds: about a factor of one thousand more expensive. The cost is mostly fixed: DNS resolution, TCP handshake, TLS, JSON serialisation and parsing.

Chain ten services in a typical microservices request graph and the request has paid thirty milliseconds of pure infrastructure latency before any business logic executes. At ten thousand requests per second, benchmarks show the microservices version of an application incurring around 140% higher p99 latency, 300% higher memory consumption, and 260% higher network I/O than the equivalent monolith. The complexity is paid in resources, then again in operational complexity, then again in the engineering hours spent diagnosing tail-latency spikes.

The Fix for the Fix

The latency the architecture introduced required service meshes to manage. Istio became the most prominent example: a control plane plus sidecar proxies that handle routing, retries, circuit breaking, observability and authentication between services. Istio's sidecar adds approximately 2.5 milliseconds of latency at p90, costs around 0.20 vCPU and 60 megabytes of RAM per proxy. Ten services with two proxies per hop consume four vCPU and over a gigabyte of memory for routing alone, paid before the application does anything.

Istio itself, telling on the architecture, quietly merged its own control plane from microservices back into a single binary in 2023. The microservices framework concluded that microservices were the wrong shape for its own implementation. The lesson did not generalise.

The Structural Cost

The deepest cost is not bytes or milliseconds. It is that sprint-shaped teams produce service-shaped architecture and never see another shape. The container makes the org chart load-bearing. Reorganising the team now requires reorganising the system. Conway's law, originally a description, became a contract.

When the team is split, the service is split. When the service is split, the runtime is split. When the runtime is split, the orchestrator becomes necessary. When the orchestrator becomes necessary, the platform team becomes necessary. When the platform team becomes necessary, the abstraction over the orchestrator becomes necessary, which is what the next Helm chart will manage. The architecture is no longer designed; it is grown, in the shape of the standups.

The Question

The Retreat, Documented

Before the alternatives, the retreat. Documented, and rarely cited.

Amazon Prime Video published, in 2023, an account of moving their video monitoring service from a distributed microservices architecture, deployed on AWS Lambda and Step Functions, back to a monolithic application running on EC2. The reported cost reduction was approximately ninety per cent. The article was written by the engineers who did the work and approved by Amazon. The Hacker News thread that followed contained the predictable defences ("they were using Lambda wrong", "they should have used containers properly"), which rather illustrated the point: the alternatives to the distributed default were treated as misconfiguration of the default, not as legitimate alternatives.

Segment, the customer data platform now part of Twilio, consolidated 140 microservices into a single monolith in 2017. Their engineering blog described the operational burden of the distributed version as untenable: testing went from hours to milliseconds, deployments simplified, on-call load dropped.

Istio, as noted, consolidated its own control plane. The microservices framework reached the conclusion that for its own implementation, a single binary was better.

37signals (Basecamp, HEY) left AWS entirely in 2023, returning to on-premises hardware running their own stack. Reported savings: around seven million dollars over five years, with hardware recouped inside a year. The architecture stayed in containers but left the orchestrator behind: a small fleet of dedicated servers running Docker, no Kubernetes, no cloud control plane.

The pattern is consistent. Teams reach the operational and financial ceiling of the distributed default and quietly walk back. The walk-back is presented in each case as an exception, not as evidence about the default.

The Alternatives

The alternatives have been quietly working all along.

On FreeBSD: Jails provide kernel-native process isolation with no daemon, no image layers, no registry, no orchestrator. They start in milliseconds because they are processes. Capsicum (FreeBSD 9.0, 2012) provides capability-based isolation at the process boundary: a process can voluntarily restrict its own access to specific file descriptors, syscalls, and resources, granted where it earns its keep. The combination is what process isolation looks like when designed by people who think about the kernel for a living.

On OpenBSD: pledge (5.9, 2015) lets a process declare which syscall categories it will use and the kernel will kill it if it tries to use any others. unveil (6.4, 2018) lets a process declare which filesystem paths it can see, narrowing the filesystem view to exactly the paths the application needs. The pair is what process-level least-privilege looks like when implemented by people who consider security a property of the operating system rather than a layer above it.

On the runtime side: Go and Rust use all CPU cores by default, ship as single static binaries of one to fifteen megabytes, and require no orchestrator to find a second thread. A Go binary running on FreeBSD inside a jail is, structurally, what containerised microservices were trying to be: isolated, lightweight, fast to start, simple to deploy.

None of these are new. None of these require migration to a different operating system, in most cases: pledge and unveil are OpenBSD-specific, but the principle of capability-based process isolation has Linux equivalents (seccomp, Landlock). The point is not that one stack is correct. The point is that the convergence of three currents produced a default that none of the three currents, individually, demanded.

The default arrived, in practice, on a marketing budget. Docker (the company) raised over 270 million dollars in venture funding. The Cloud Native Computing Foundation, the body that stewards Kubernetes, lists more than 200 corporate members, several of them with platform businesses whose strategy depends on the container being the unit of deployment. The conferences, the certifications, the cloud-native landscape diagram with its hundreds of competing logos: each is a piece of advertisement that pays for itself by reinforcing the architecture. The technical case for containerising everything has rarely been the loudest case in the room. The case won the room on advertisement, not on parts removed.

The alternatives quoted above did not arrive on similar budgets. FreeBSD's marketing department is, generously, the FreeBSD Foundation's annual report. The OpenBSD project's marketing department is Theo de Raadt's mailing-list voice. Go and Rust each had corporate sponsors, but neither sponsor sold a cloud orchestrator that depended on the runtime being too weak to find its own cores. The alternatives are not better-marketed; they are differently funded, which is most of the explanation for why they sound, in 2026, like nostalgia.

Conway's law can be read forwards or backwards. Forwards: organisational structure shapes architecture. Backwards: architecture shapes which organisational structures remain possible. A sprint-shaped team will produce service-shaped architecture; service-shaped architecture, once built, will tend to enforce sprint-shaped teams. The cycle is reinforcing.

The honest question is not which orchestrator. It is whether the isolation question, the runtime question and the team question were ever the same question. They became the same question because the same default answered all three. The default arrived without a vote.

What if isolation were a property of a process, granted where it earns its keep, rather than a shipping container we wrap around everything we no longer wish to think about? What if the runtime knew its own cores? What if the team shape did not need to be the system shape?

The Dockerfile is so reflexive that the alternatives feel like nostalgia. Twenty-five years of working alternatives suggest the nostalgia is misplaced. They are simply the path not taken, still running, still patient, still answering the question the convergence forgot was three.

Read the full article on vivianvoss.net →

DEV Community