Mohamed Zrouga

Posted on May 16

I Got Tired of Docker Eating My Raspberry Pi's RAM — So I Built My Own Container Orchestrator

#containers #security #raspberrypi #homelab

Minimal OCI stack for CVE reduction

A few months ago I noticed something that genuinely annoyed me.

I was running tiny services across my Raspberry Pi lab:

webhook workers
monitoring agents
lightweight APIs
ETL processing tasks

Small, focused workloads. The kind of things a Pi is actually good at.

But the infrastructure stack underneath them? It looked like this:

Docker Engine
 └─ containerd
     └─ runc
         └─ CNI plugins (all of them, even the ones I'd never touch)
             └─ orchestration layer
                 └─ service networking
                     └─ monitoring sidecars
                         └─ my actual 8MB process

At some point I sat down and ran ps aux and free -h on a freshly booted node before deploying anything.

The infrastructure was already using more RAM than my applications.

That felt wrong. So I started pulling threads.

What Actually Runs a Container?

Strip everything back. What does "running a container" actually require?

An OCI image unpacked to disk
A rootfs (overlayfs layers)
Linux namespaces (pid, net, mount, uts, ipc)
cgroup resource limits ( cgroup V2 )
A process in that environment

That's it. That's the whole thing.

Everything else — the daemon, the socket, the plugin ecosystem, the CNI chain, the abstraction layers — exists to make that easier to manage at scale.

Scale I don't have on a Raspberry Pi.

So I asked: what's the absolute minimum secure OCI stack that can do this properly?

That question became nyxd https://github.com/zrougamed/nyxd.git .

nyxd: What It Is and What It Isn't

nyxd is a lightweight container daemon I built in Go. It uses:

crun as the OCI runtime (not runc — more on this shortly)
overlayfs directly via syscall for rootfs
no CNI plugins
nftables for NAT and port mapping
seccomp + NoNewPrivileges enforced on every container

It is not:

Kubernetes
another Docker replacement
"another Podman"
trying to be any of those things

The philosophy is reduction. Every component you remove is an attack surface that disappears, a dependency you don't have to update, a CVE you'll never have to patch.

Why crun Instead of runc

This is the first question people would ask.

Here's the honest answer: crun has a remarkably clean security history.

crun CVE history (complete, as of 16-05-2026):

CVE	Impact / Type	Affected	Fixed In
CVE-2026-30892	Local Privilege Escalation via crun exec -u 1 root parsing flaw.	1.19 – 1.26	1.27
CVE-2025-24965	Host Filesystem Escape via the krun architecture handler.	< 1.20	1.20
CVE-2022-27650	Privilege jump inside container via leaked Inheritable Capabilities.	< 1.4.4	1.4.4
CVE-2019-18837	Host Directory Traversal via malicious symlinks in a crafted image.	< 0.10.5	0.10.5

Very few CVEs. In its entire history.

Compare that to November 2025, when runc had three container-escape vulnerabilities disclosed in a single week: CVE-2025-31133, CVE-2025-52565, and CVE-2025-52881. All critical. All allowing container escape to host.

Beyond security, crun has practical advantages for edge/ARM workloads:

Lower memory footprint per container
Faster startup (C runtime, not Go)
Excellent cgroup v2 support from day one
Smaller binary (~800KB vs runc's several MB)

On a Pi with 1-4GB RAM running 20+ containers, that adds up fast.

The CVE Surface Nobody Benchmarks

Here's something I started thinking about that I hadn't seen anyone measure properly:

People benchmark CPU and RAM constantly. Almost nobody benchmarks security surface area.

But when you're running edge infrastructure — systems with limited ops coverage, longer uptimes, harder recovery paths — the CVE surface is arguably the most important operational metric.

So let me be direct about the current state as of 16-05-2026:

containernetworking/plugins CVE history (recent):

CVE	Plugin	Issue	Fixed
CVE-2025-67499	portmap	nftables backend intercepts unintended traffic	v1.9.0
CVE-2025-52881	selinux dep	container escape via procfs write misdirection	v1.9.1
CVE-2024-34156	all	Go stdlib encoding/gob	v1.4.0-6
CVE-2023-45290	all	Go stdlib net/http	v1.4.0-3

And this is just the CNI plugin layer — the binaries a lot of stacks still exec on every container ADD. nyxd's default path doesn't run those plugins at all; I still think the table matters as a reminder of what you inherit when you opt into the full CNI distribution. Add Docker Engine, containerd, runc, BuildKit, and their respective dependency trees and you're tracking dozens of CVEs per year across a stack that most people never fully audit.

The rough security surface comparison:

Stack	Binary count	External deps	Rough CVE exposure/yr
nyxd + crun	2	3 Go modules	Very low
Podman + crun	~8	Large	Low-medium
Docker Engine full	15+	Very large	Medium-high
containerd + runc + full CNI	20+	Massive	High
Kubernetes node	30+	Enormous	Very high

The numbers aren't precise science. But the trend is real.

Simplicity is a security feature. I genuinely believe that now.

The Networking Decision

Most of the container networking ecosystem exists to solve problems at scale. VXLAN overlays, BGP route propagation, multi-cluster service meshes.

None of that applies to a Raspberry Pi lab.

nyxd's default networking is implemented inside the daemon, in Go, using raw netlink — not by exec'ing the usual CNI plugin chain under /opt/cni/bin.

What that stack actually does:

Bridge + veth — brings up nyxbr0, creates the pair, moves the peer into the container netns, names it eth0
File-backed IPAM — hands out addresses from the same 10.88.0.0/16-style slice you'd expect from a tiny lab bridge
Loopback — lo up inside the netns
Port publishing — host→container maps via nftables (we shell out to nft for the rules; that's the one small networking helper we deliberately keep external)

What we don't use (on that default path):

The bridge / host-local / loopback / portmap binaries from containernetworking/plugins
Flannel
Calico
kube-proxy
Weave
Cilium (for this use case)
Any overlay mesh networking

If you want the traditional CNI exec path — because you already ship a conflist or you're mirroring another environment — nyxd -net-driver=cni is still there. Same supervisor and API; you bring /opt/cni/bin and your plugins. That's optional complexity, not what you get out of the box.

Zero CNI plugin binaries on the default path means:

Nothing to install under /opt/cni/bin on a fresh Pi
Nothing to keep updated in that directory for homelab-sized deployments
No binary-level CVE surface in that layer for the stack I'm actually running day to day
Faster container startup (no fork/exec chain per CNI ADD)

Kernel Requirements (the honest list)

Running nyxd requires these kernel modules:

overlay          # overlayfs for container rootfs
bridge           # nyxbr0 bridge interface
veth             # virtual ethernet pairs
br_netfilter     # iptables/nftables sees bridged traffic
ip_tables        # iptables core
iptable_nat      # NAT table
nf_nat           # connection tracking NAT
nf_conntrack     # stateful packet tracking
nft_masq         # nftables masquerade
seccomp          # syscall filtering

And these sysctls:

net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.conf.all.rp_filter = 0

All of this works on a standard Raspberry Pi OS kernel (6.1 LTS). No custom kernel needed.

Benchmarking This Honestly

Here's how I'm measuring whether nyxd actually delivers on its promises.

Startup latency:

hyperfine --warmup 3 \
  'docker run --rm alpine true' \
  'nyx run alpine true'

On my Pi 5: nyx vs Docker

Attack surface check:

# On a Docker host: how many CNI plugins are sitting on disk
ls -la /opt/cni/bin/ 2>/dev/null | wc -l

# Compare to nyxd default: no plugin dir required
lsof -p $(pgrep -x nyxd) | wc -l   # open file descriptors
ss -tlnp | grep nyxd                 # listening sockets (usually just the Unix control socket)

Docker opens more sockets, more file descriptors, and maintains more persistent background connections than a Pi workload typically justifies.

Dependency tree:

# nyxd go.mod external deps
cat go.mod | grep -v "^//"
# 3 modules: image-spec, runtime-spec, golang.org/x/sys

Three. External. Modules. That's the entire dependency graph for the runtime and networking layer.

What the Raspberry Pi Actually Taught Me

The Pi has an interesting property: it forces you to care about things cloud infrastructure lets you ignore.

Thermal throttling — when your CPU is running hot because your container daemon is doing background work, your actual workloads slow down. Less daemon overhead means cooler, more consistent performance.

SD card / SSD wear — fewer writes from logging, state management, and plugin communication extends storage life meaningfully on embedded deployments.

Boot time — when a power cut hits an edge node, boot-to-operational time matters. A lighter stack comes up faster and can rejoin the network sooner.

Debugging under pressure — when something breaks at 2AM on a remote node you can't physically access, a simpler stack is dramatically easier to reason about. Fewer layers means fewer places to look.

Power consumption — I've measured ~1.5W difference in idle power between a full Docker stack and nyxd on a Pi 5. Across 10 nodes running 24/7, that's ~130kWh/year. Not enormous, but real.

Where nyxd Is Right Now

This section separates what the daemon and nyx client exercise end-to-end from what exists mainly as libraries, stubs, or unfinished wiring.

Working (end-to-end in the daemon + `nyx` client)

OCI distribution pull for public images: raw HTTP against registries, layer blobs, and SHA-256 digest verification on ingest (digest mismatch fails the pull).
Overlayfs upper/work/merged layout and teardown on container exit paths.
Container lifecycle via crun from the supervisor: create/run (including the foreground attach path), stop, kill, delete, and state polling; the control API supports nyx exec.
Restart policies: always, on-failure, unless-stopped, and never (empty or unrecognized API values are normalized to conservative defaults in the control layer).
Structured JSON lines per container under the daemon data directory (log collector plus GET /v1/containers/{id}/logs), with optional plain-text decoding for attached clients.
Default in-process networking (-net-driver=native): bridge, veth, file-locked IPAM, nftables-based -p / publish, and host-side NAT — no CNI plugin binaries on disk.
Optional CNI exec path: -net-driver=cni with -cni-bin-dir, -cni-conf-dir, and -network for a traditional plugin-driven stack instead of native.

Implemented but not product-complete (nuance)

Health checks (exec / HTTP / TCP) — internal/health implements a full checker (intervals, timeouts, retries, start period), but it is not hooked into supervisor: no checker goroutine started with each container, and no unhealthy → stop/restart policy wiring. The code is library-ready, not operator-ready.
Prometheus-style metrics — internal/telemetry implements a text OpenMetrics/Prometheus-style /metrics handler, but cmd/nyxd does not call ServeMetrics, so counters and gauges are not exposed unless another binary wires the package in. The format exists; the default daemon does not listen for scrape traffic.

Still being hardened / incomplete

Registry auth beyond anonymous public pulls — the current path centers on the Docker Hub anonymous token flow; private registries, stored credentials, and arbitrary OAuth/OCI auth flows are not first-class yet.
Compose — YAML parsing is real (gopkg.in/yaml.v3): internal/compose reads a strict subset of compose-shaped fields, validates stacks, and can compute dependency order. What is missing is daemon-side orchestration (no nyx compose up-style command in the shipped mains): a parser is not a multi-service scheduler.
Control API and nyx CLI — Unix-socket HTTP for run, ps, logs, stop, remove, pull, images, and exec is usable, but error messages, edge cases, and long-term API stability are still evolving.
Seccomp — generated bundles set capabilities, noNewPrivileges, masked paths, and related hardening fields; there is no curated, versioned seccomp JSON checked into the bundle generator yet. Anything beyond the explicit JSON is whatever crun and the host apply by default.
systemd-notify — example units may use Type=notify, but nyxd does not emit READY=1 (or reload state) via sd_notify. Production units should use Type=simple until notify is implemented, or notify must be added to the daemon.
NFT / port publishing on unusual host sysctl values, exotic dual-stack setups, and odd bridge topologies — the native backend is real code, but it benefits from more soak time outside typical laptop and homelab bridges.

Bottom line

nyxd is not positioned as a drop-in, production-grade Docker replacement today. The largest gaps between documentation and reality are health automation (implemented package, not integrated) and metrics (handler exists, not served by default). Registry authentication and compose orchestration remain intentionally narrow.

The stack does run real workloads in lab settings along the paths above: pull, overlay, crun, native networking, structured logs, and restart policies. The architectural bet — native networking by default, optional CNI when operators want plugin ecosystems — remains coherent even where polish and operational breadth still lag.

The Bigger Question

I keep coming back to this:

Are we over-engineering edge infrastructure?

The modern container ecosystem was designed to solve orchestration at Google-scale. Kubernetes, CNI, CRI, OCI — these are all excellent standards that solved real problems.

But those standards got adopted at every layer of the stack, including layers where the complexity isn't warranted.

A Raspberry Pi running an Go API doesn't need the same infrastructure as a 10,000-node Kubernetes cluster.

An industrial IoT gateway doesn't need BuildKit.

A homelab monitoring stack doesn't need containerd's full plugin system.

The standards are fine. The problem is using the full weight of the enterprise implementation everywhere, including at the edge where resource constraints and operational simplicity matter most.

nyxd is my attempt to find where the floor is. How small can a correct, secure, production-capable OCI runtime stack actually be?

I don't think we've found it yet.

Try It / Follow Along

nyxd is being developed openly. The codebase is Go 1.26.

If you're running:

Raspberry Pi clusters
ARM64 edge nodes
Self-hosted systems
VM , QEmu , Proxmox ...
Any environment where RAM and attack surface actually matter

I'd genuinely love to hear what you're running and what constraints you're working within.

A few questions for the comments:

What does your container stack look like on ARM systems today?
What's your idle memory baseline before deploying any workloads?
Have you ever audited the CVE history of your CNI plugins?
Would you trade orchestration features for a meaningfully smaller attack surface?
Is your container stack heavier than your actual workloads?

If there's enough interest, follow-up posts will cover:

Full architecture walkthrough with the native network stack (and when I'd still flip on CNI)
Memory profiling methodology for container daemons
Security surface comparison methodology
Deploying nyxd on a Pi cluster from scratch
The case for writing your own IPAM in 200 lines of Go

nyxd is made by the community for the community - free to use for personal use

Top comments (9)

Mykola Kondratiuk • May 25

built something similar for a Pi cluster last year - the RAM savings are real but the operational overhead of a custom orchestrator sneaks up on you when something crashes at 3am and you have zero community docs to debug against.

Mohamed Zrouga • May 25

@itskondrat , I 100% agree, and I think this is the most honest critique of any custom orchestrator project.

The RAM savings are almost guaranteed. The operational overhead is where you pay for it , and that's exactly why we're investing in docs at nyxd.io/. Still growing, and we genuinely welcome contributors to it , if you've been down this road before, your experience navigating failure modes would make for great documentation.

What I'm trying to do differently is keep the stack shallow enough that when something does break, you're reasoning about two binaries and a handful of kernel modules rather than the full containerd + CNI chain. Not the same as having a Stack Overflow thread to land on at 3AM, but it does compress the search space.

What did your setup look like? Genuinely curious where the failure modes showed up — might help shape where we focus next.

Fyodor • May 17

Nice project. Didn't you actually compare/benchmark it (on RAM consumption) with podman? I understand there are some [significant] other implications (dependencies and security) but still (as podman is quite a project!) — I didn't try it myself on Pi but was planning to, so I wonder how big is the difference.

Mohamed Zrouga • May 17 • Edited

@fyodorio thanks! The comparison with Podman is a bit tricky because the architecture goals are different.

Podman is daemonless by design, while nyxd is intentionally built around a daemon architecture because the long-term goal is clustering and distributed orchestration in future releases.

So a pure “apples-to-apples” benchmark isn’t fully accurate.

That said, from my current ARM/Pi benchmarks:

nyxd is roughly ~1.2x faster than Docker on startup/runtime operations
Podman is currently about ~1.1x faster than nyxd in raw execution paths

But resource-wise, nyxd still comes out lighter overall because the stack is extremely small:

basically just crun + the nyxd binary
no large supporting component chain like Docker/containerd
no traditional CNI plugin dependency on the default path

So even though Podman is impressively lean already, the total idle footprint and dependency surface of nyxd is still lower in my tests.

I’ll probably publish proper memory profiling numbers and methodology in a follow-up post because I think the ARM space really needs more transparent benchmarking around this stuff.

Siyu • May 17

Seeing you measure not just CPU and RAM but CVE surface area as an operational metric — and backing it with real numbers like the crun vs runc CVE comparison — genuinely shifted how I think about edge infrastructure.

Mohamed Zrouga • May 17

@innovationsiyu I really appreciate that comment because that exact realization is what pushed me down this rabbit hole.

CPU and RAM are easy to benchmark, but attack surface accumulation across layers (CNI, runtimes, plugins, daemon chains, helper binaries, etc.) is something almost nobody visualizes quantitatively.

The more I audited the stack, the more I realized how much inherited complexity we normalize by default — even for workloads that absolutely don’t need it.