DEV Community

Krish
Krish

Posted on

Building a Rootless Runtime Faster Than Docker

GitHub: https://github.com/krish561/J-Container/
We often treat Docker or Podman as magic boxes: you put code in, and an isolated machine comes out. But as a systems engineer, "magic" is just code I haven't read yet.

To understand how containers really work, I built J-Container—a rootless container runtime built from scratch using C and Java. My goal wasn't to replace Docker, but to prove that the heavy lifting of isolation is actually lightweight.

The result? A runtime that starts up in 149ms—roughly 5x faster than Docker in a clean benchmark. Here is how it works.

The Architecture: Java Meets C Most runtimes are written in Go (Docker, Podman, Kubernetes). I chose a hybrid approach to separate concerns:

The Brain (Java): Handles user arguments, path validation, and orchestration.

The Muscle (C): A tiny native shim that talks directly to the Linux Kernel.
Enter fullscreen mode Exit fullscreen mode

The "Magic" Trick: Namespaces Containers are essentially just Linux processes with a restricted view of the system. I used clone() with specific flags:

CLONE_NEWPID: The container thinks it is PID 1.

CLONE_NEWUSER: The container thinks it is Root (UID 0), even though I'm a standard user on the host.
Enter fullscreen mode Exit fullscreen mode

The Challenge: The Race Condition The hardest bug I solved was the "Rootless Mapping Race." When you create a new User Namespace, the child process cannot perform any privileged operations (like mounting file systems) until its UID map is set. But the UID map must be written by the parent process from the outside.

If the child tries to run mount before the parent writes the map, it crashes.

The Fix: I implemented a synchronization barrier using a standard Linux pipe.

Parent: Creates a pipe and clones the child.

Child: Closes the write end and attempts to read() from the pipe. This blocks execution.

Parent: Writes the uid_map and gid_map.

Parent: Writes a byte to the pipe.

Child: Unblocks, detects the map is ready, and proceeds to pivot_root.
Enter fullscreen mode Exit fullscreen mode

Benchmarks: The Cost of Abstraction I benchmarked J-Container against Docker and Podman running a minimal Alpine image (/bin/true) over 100 iterations.

J-Container: 149ms (± 16ms)

Docker: 778ms (± 102ms)

Podman: 988ms (± 135ms)
Enter fullscreen mode Exit fullscreen mode

J-Container is faster not because my code is better, but because it does less. Docker sets up networking bridges, overlay filesystems, and security profiles. J-Container proves that the Linux Kernel primitives themselves are nearly instantaneous (<10ms). The 140ms overhead is almost entirely the JVM startup time.

Conclusion Building J-Container taught me that containers aren't magic—they are a clever composition of Linux syscalls. If you want to understand your tools, try building a toy version of them.

Benchmark 1: java JContainer run ../rootfs /bin/true
Time (mean ± σ): 138.6 ms ± 11.5 ms [User: 151.6 ms, System: 43.5 ms]
Range (min … max): 124.7 ms … 204.1 ms 100 runs
Benchmark 2: docker run --rm alpine:latest /bin/true
Time (mean ± σ): 754.4 ms ± 96.1 ms [User: 31.6 ms, System: 35.7 ms]
Range (min … max): 636.9 ms … 1120.9 ms 100 runs
Benchmark 3: podman run --rm alpine:latest /bin/true
Time (mean ± σ): 1.011 s ± 0.152 s [User: 0.173 s, System: 0.146 s]
Range (min … max): 0.715 s … 1.341 s 100 runs

Summary:
java JContainer run ../rootfs /bin/true ran
5.44 ± 0.83 times faster than docker run --rm alpine:latest /bin/true
7.29 ± 1.25 times faster than podman run --rm alpine:latest /bin/true

Top comments (0)