DEV Community

Alexei Ledenev
Alexei Ledenev

Posted on

Pumba v1.1.0: Native Podman Support, and What "Docker-Compatible API" Actually Means

Pumba — a container chaos CLI I've maintained since 2016 — just shipped v1.1.0 with native Podman runtime support alongside Docker and containerd.

I'd expected this to be the quiet release. Podman advertises a Docker-compatible API. The Docker SDK connects to its socket and most calls work unchanged. That part turned out to be true.

What I didn't expect: the 10% where it doesn't match are exactly the calls a chaos tool lives on.

The landmines

1. ContainerExecStart with empty options

Docker accepts ExecStartOptions{} — no AttachStdout, no AttachStderr, no Detach. Podman rejects it outright: "must provide at least one stream to attach to." Four callsites in pumba (tc exec, iptables exec, exec-on-container, command-existence check) had to switch from ContainerExecStart to ContainerExecAttach + drain + ContainerExecInspect. About sixty test mocks needed updating for flags Docker didn't require.

2. Cgroup path divergence

  • Docker: docker-<id>.scope
  • Podman: libpod-<id>.scope
  • Podman + systemd: often nests a libpod-<id>.scope/container/ leaf as libpod's init sub-cgroup
  • cgroup v2 forbids processes in internal nodes, so stress-ng sidecars must target the nested leaf when it exists
  • Podman's default cgroupns=private means /proc/self/cgroup inside the target is 0::/ — ancestry hidden

Pumba now reads /proc/<pid>/cgroup host-side. Which means pumba must run on the same kernel as the targets. On macOS: inside the podman machine VM. Same pattern we already used for containerd-in-Colima.

3. Sidecar reap

tc sidecars run tail -f /dev/null as PID 1. PID 1 ignores SIGTERM. Podman's DELETE-with-force sends SIGTERM, waits StopTimeout (default 10s), then SIGKILLs. Every netem call was paying 10s per cleanup. Fix: StopSignal: "SIGKILL" on the sidecar. Immediate reap.

4. Cleanup vs. caller cancellation

If pumba itself is SIGTERM'd between tc exec and sidecar removal, the cleanup defer never runs — sidecar leaks and the netem qdisc persists on the target's netns. Cleanup now uses context.WithoutCancel(ctx) with a 15s budget so the defer actually survives cancellation.

What's not in the release (honestly)

Rootless Podman support. Detected at client init from Info.SecurityOptions; netem/iptables/stress fail fast with guidance (podman machine set --rootful or the rootful systemd unit). Doing rootless correctly needs slirp4netns/pasta netns handling and user-namespace cgroup math — that's a release of its own, not a marketing-shaped hack.

The broader lesson

"Compatible API" is usually harder to integrate against than "completely different API." With a different API you build a fresh mental model and check every call. With a compatible one you assume parity and discover the exceptions empirically.

Links

If you're running chaos tests on Podman and hit something I missed — open an issue.

Top comments (0)