Andrei

Posted on Mar 29

Making Ansible Believe in systemd Inside a gVisor Container

#automation #devops #docker #security

I'm building a hands-on DevOps learning platform. Users open a challenge, write a solution (an Ansible playbook, a Dockerfile, a config file) and run it against real infrastructure. Right now the platform covers two tracks, Ansible and Docker, with plans to expand both the number of tracks and their depth.

Sounds straightforward, but under the hood it's a pile of engineering problems. You need to give users the feel of a real server while isolating them so they can't harm the host or other users.
The entire sandbox service is written in Go: container management, output streaming, hot container pool, verification. Go fits this kind of work well: goroutines for parallel monitoring, a single binary to deploy, solid process management through os/exec.

This post focuses on the Ansible sandbox, because that's where the most non-trivial problem lives: making Ansible work with services inside a gVisor container that has no systemd.

Why this platform exists?

Learning DevOps right now goes something like this. You read the docs (or chat with your AI agent), watch courses, maybe pass a certification test. But as we all know, practice is the single most important part of learning a technology. You can always spin up one VM, then another, and eventually you hit resource limits or find yourself maintaining your own learning infrastructure.

But why build another KodeKloud? That's what I thought too, until I actually tried opening a lab. Three minutes to load the environment. One attempt, after which you have to recreate the whole thing. A time limit on execution. How well can you do a task in 45 minutes, test it, and get it right on the first try so you don't have to wait for a new environment? What if you get distracted, need to think, want to try a different approach?

I wanted something different. Open a challenge, write a solution, run it against real infrastructure, get real output. Mess up a config, see the error, fix it, run again. No timers, no waiting for environment recreation, no limit on attempts. Like a real server, but safe.
That's where the whole architecture described below came from.

Why gVisor, not just Docker?

The platform lets users execute arbitrary code. An Ansible playbook can install packages, create users, write to any file. A regular Docker container shares the host kernel, and a syscall vulnerability could mean a container escape.

gVisor (runsc runtime) intercepts syscalls and handles them in userspace. The container talks to gVisor's mini-kernel, not the host's. Even if someone finds a hole in a syscall, they end up inside gVisor, not on the host.

On top of that, every container gets hard limits: capped memory, a fraction of a CPU core, a process ceiling, tmpfs instead of real disk writes. But gVisor creates a new headache: there's no systemd inside the container. And Ansible won't manage services without it.

Ansible wants systemd, but there's nothing there.

When Ansible runs something like:
- name: Enable and start nginx systemd: name: nginx state: started enabled: yes
the systemdmodule does a bunch of things under the hood. It checks whether /run/systemd/system exists, and if not, it bails immediately. It calls systemctl show <service> and parses key=value pairs: ActiveState, LoadState, SubState. It runs systemctl start/stop/enable/disable. It checks --version to figure out what systemd can do.

Without a real systemctlbinary, all of this falls apart. And pulling real systemd into a gVisor container isn't an option: no proper cgroups, no PID 1 as init.

A shell script pretending to be systemctl.

I noticed something: Ansible doesn't check whether systemd is running. It checks whether systemctlresponds the right way. So instead of real systemd, a shell script that gives the right answers is enough.

The script parses the first argument via case:

Management commands (start, stop, enable, disable, restart, daemon-reload and the rest) just return exit 0. In a learning sandbox we don't need to actually manage services, we need Ansible to think the command worked.
Status queries (is-active, is-enabled) output "active" / "enabled".

The most interesting case is systemctl show. The systemd module parses its output as Key=Value pairs and crashes if expected fields are missing. I had to dig through Ansible module source code and go through several rounds of "run, crash, check which field it wants, add it." The final list: ActiveState, LoadState, UnitFileState, SubState, FragmentPath, StatusErrno, CanReload. Seven fields total.
systemctlstatus gets called by some playbooks via command or shell, so the output has to look believable: the ● symbol, Loaded: and Active: lines.

list-unit-files is the only command that does something non-trivial. The script scans actual systemd directories and lists real unit files it finds there. When a user's playbook installs nginx, the package drops nginx.service into one of those directories, and list-unit-files picks it up.

--version returns a fake systemd version with a full set of feature flags (+PAM +AUDIT +SELINUX ...). Some tools check these flags before doing anything.

Four PATH interception points.

Dropping the script at /usr/bin/systemctl isn't enough. PATH varies by calling context, and there's another trap involving package managers.
The mock goes into /usr/sbin/systemctl as the primary location. Ansible uses a restricted PATH (/sbin:/bin:/usr/sbin:/usr/bin) where /usr/sbin comes before /usr/bin, so our mock wins.

It's duplicated to /usr/bin/systemctl for the case where the user hasn't installed anything yet. On CentOS/Rocky, dnf install nginx pulls systemd as a dependency and drops real systemctl at /usr/bin. But our /usr/sbin copy still wins by PATH order.

Copies at /usr/local/sbin/systemctl and /usr/local/bin/systemctl cover scripts that use a full PATH with /usr/local first.
One more detail that's easy to miss:

mkdir -p /run/systemd/system /etc/systemd/system

Without /run/systemd/system, the systemd module doesn't even try calling systemctl, it immediately decides systemd is unavailable. I spent a decent amount of time debugging before realizing Ansible checks for this directory before it attempts to run anything.

policy-rc.d on Debian.

On Debian-based distros, apt-get install nginx tries to start the service via invoke-rc.d after installation. In a container without an init system, this fails and breaks the package installation itself.
The fix: create /usr/sbin/policy-rc.d returning exit code 101, which means "policy forbids starting services." The package installs, the unit file gets created, but the service doesn't start.

CentOS/Rocky don't have this issue, Red Hat uses a different mechanism. Alpine doesn't either, apk add doesn't try to start services.

The mock lies, and that's (mostly) fine.

The mock has an obvious limitation that shouldn't be swept under the rug. It always returns exit 0 for start and restart. So if a user writes a playbook that installs nginx, drops a config with a missing closing brace, and calls systemd: state=restarted, Ansible will say ok. In real life, nginx would crash on startup and Ansible would report an error.
This is a deliberate trade-off. Actually running services inside a gVisor container would mean pulling in a full init process, which breaks the entire isolation model.

Instead, correctness checking lives in a separate layer: a verification script (check.sh) that runs after each playbook pass. For an nginx config challenge, the verifier might run nginx -t to check syntax, confirm that required directives are present, verify file permissions.
The chain works like this: Ansible verifies that commands executed (via the mock), then the verifier checks that the result is actually correct (via real file and config inspections). A user with a broken config will get "Verification failed", not at the systemctl restart step, but at the verification step.

Fork bomb guard.

Users run arbitrary playbooks. Someone might (accidentally or not) trigger a fork bomb. Docker's --pids-limit caps the process count, but it doesn't kill the container, it just hangs.

A watchdog goroutine runs alongside every execution. Every few seconds it checks the process count via docker top. If it's suspiciously high, the container is killed via docker kill and the user gets a message. The threshold is set with margin: a normal playbook with apt-get and a couple of services runs 10-30 processes. A count near the pids-limitceiling is almost certainly a fork bomb.

Double run and lock files.

Idempotency is a core Ansible concept: re-running a playbook should change nothing. The platform tests this automatically, runs every playbook twice, with verification after each pass.
There's a catch between runs. The first pass installs packages. After Ansible finishes, the apt-get or dnf process might still be alive, writing cache, updating the package database. The second pass hits a lock file and fails.

I had to add a cleanup step between passes: kill leftover package manager processes, remove lock files. Small thing, but without it idempotency broke on a flat surface. A side note: pidofisn't installed on minimal CentOS images, so process discovery uses ps + awk instead.

Multi-platform challenges.

Some challenges require a playbook to work across multiple distros at once, Ubuntu and CentOS for example. A container is created per distro, a grouped Ansible inventory is assembled, and the same playbook runs across all groups. A challenge passes only if verification succeeds on every platform.

The fork bomb guard monitors all containers simultaneously during these runs, and lock file cleanup runs on each container before the second pass.

Hot container pool.

Bootstrapping a container means docker run + installing dependencies + the systemctl mock. On Ubuntu that's 8-15 seconds. For an interactive platform, that's too much.

On service startup, several ready containers are created for popular images (Ubuntu, Debian, CentOS). They've already been through bootstrap and just sit there waiting. When one is taken, a replacement immediately starts warming up in the background so the next user doesn't have to wait.

A few things I spent time on:
A pooled container can die: OOM, timeout, Docker bug. You can't just trust an in-memory list. Before handing out a container, each candidate gets checked via docker inspect. Dead ones are silently cleaned up, replacements kick off.

Rapid sequential requests could spawn a bunch of parallel bootstraps for the same image. To avoid this, there's a counter tracking how many containers are currently being created. If the total (ready + warming) is already at the target, nothing new starts.
When multiple distros are needed at once, the grab happens in a single lock. Otherwise two parallel requests could end up with the same container.

Rare images (Alpine, Rocky) get created on the fly, the full bootstrap cycle. Popular ones come out of the pool in milliseconds.

Layers of defense.

The whole thing is several layers, each covering for the one before it.

gVisor intercepts syscalls and keeps users away from the host kernel. Resource limits prevent eating all the memory and CPU. A process ceiling caps things at the Docker level, and the fork bomb guard kills the container at the application level if the count gets suspicious. A context timeout kills everything if execution drags on. An image whitelist prevents running anything outside the approved list. The systemctl mock gives controlled emulation instead of real systemd. policy-rc.d blocks service autostart on Debian. And every run happens in a disposable container that gets destroyed afterward.

What's next?

This whole setup with gVisor + systemctl mock is a working but temporary solution. The main limitation: services don't actually run, and that has to be compensated through verifiers. It works for a learning platform, but it scales with effort.

The next step is migrating to infrastructure with Kata Containers. Kata gives you lightweight virtual machines instead of containers. Inside you get a real kernel, real systemd, services can actually start and fail. The overhead compared to gVisor is moderate, and the isolation level is actually higher since it's real hardware virtualization.

With Kata, the systemctl mock becomes unnecessary, idempotency checks become more precise (Ansible will see real service startup failures), and it opens the door for more complex challenges that depend on actual service state.

For now, gVisor + mock keeps the platform running on two servers with a total infrastructure budget under $150 a month. For a startup that hasn't hit monetization yet, that's a workable setup.

The platform is being developed every day, even when it's not visible from the outside. Most of the work happens under the hood: sandbox optimization, new challenges, better verifiers. When you're building a product solo, visible progress doesn't always match actual progress.

Try the platform: infrapath.io.

DEV Community

Making Ansible Believe in systemd Inside a gVisor Container

Top comments (0)