MrViK

Posted on Jan 5, 2021

Writing an init with Go (part 3, running it!)

#go #linux #systemsprogramming #showdev

Hey there. Sorry for the delay after parts 1 and 2. I've been busy doing some changes, will make a 4th article about those.

As said before, we'll be running this init on a qemu VM.

While running it, a thing I have noticed (running getty only) is that we're used to udev where all things Just Work™️ when plugged. systemd has involved into all those things contributing to create mysticism around the PID 1 and what it does.

No more theory, let's just dive inside this thing.
For the test I used a VM with root on 9p (could be NFS, but was just another test) so the dev and test env are shared and I can push changes and reboot to get them (at first I did poweroff, mount, copy, unmount and run it again).

I did test on 2 chroots, one with Arch Linux and another one with Void (it's init-agnostic so is easier to get things done). Also compiled a kernel to have all virtio modules builtin (Arch doesn't have a working udev and didn't bother to get one so this is the bruteforce solution).

Compiling

Source is here: https://gitlab.com/mrvik/go-pid1

Go has one of the fastest compilers, so you can't slack off.
We're using make to manage all the targets so you need:

make (tested with GNU Make)
go toolchain (package is called go or golang depending on the distro)

Then make all or simply make and done, you have an out dir with all the utils statically linked (as they are in pure Go).

You should copy them to your VM under /usr/local/bin to avoid conflicts and link init and service-launcher to root (service launcher path is hardcoded, this should be changed).

There is a folder on project called examples, copy it as /etc/yml-services so we have something to talk about.

Void w/ go on PID 1

Here is my qemu cmdline:

qemu-system-x86_64 -accel kvm \
  -kernel ../linux-5.10.4/arch/x86/boot/bzImage \
  -initrd root/boot/initramfs-linux-fallback.img \
  -append "root=192.168.1.10 rootfstype=9p rootflags=aname=$PWD/void,msize=16000,version=9p2000.L,uname=root,access=user rw init=/init ip=dhcp loglevel=3" \
  -m 1024 -nic user,model=virtio-net-pci -vga qxl

kernel is from the compilation dir (latest stable release at the time of preparing this). Did also test with mine (vanilla linux package from Arch) and worked, so Void's should also work.
initrd comes from my system (yes, kernel version doesn't match, but it doesn't matter as all modules are already built in). It was created using mkinitcpio with the net hook to enable networking on the early userspace. Maybe this could be done with dracut but, whatever, it works.
append has a lot of joy. I'm using v9fs as root so all those arguments are for that purpose and ip=dhcp does what you already know. The most important thing here is init=/init as it's the location where I did copy the init target. Setting loglevel is important to avoid being flooded with kernel logs.
Other boring things are the memory, a virtio net device (needed for 9p on root) and the VGA model.

And here we are! All lines printed by go-pid1 have the same format (from log package). We can login and see what's going on here:

htop shows the current hierarchy, every process is child of service-launcher (who has started everything present on /etc/yml-services ending with .yml or .yaml. dhclient is forking, so he's now child of PID 1 (see next article on how do we manage this now).

The control socket

On Part 2 we mentioned service-launcher exposes a socket so we can control daemons and a utility called slc to connect to the socket w/o openbsd-netcat installed (-U option support).

List contains currently loaded services and their state. It's worth noting that dhclient shows as not started (not running) as this process has forked and main process exited. We cannot track it's descendents (see next article to track progress on this).

We can use the restart command (also the start and stop) with a service to make the needed actions.
Also, the journal command shows logs for the specified service (we face a lot of "operation not permitted" errors w/ root on 9p).

And it works. The socket can potentially run or expose any function from the service-launcher.

Run init systems for fun and... fun

As we're not dropping capabilities nor doing any type of confining, we can virtually execute all processes.
The only limitation is the PID. Init must be PID 1, but it's already coped.

pid_namespaces(7) to the rescue! Look at cmd/run-on-pid-ns. It will create a new pid_namespace(7), time_namespace(7) and mount_namespace(7). The time namespace is just to isolate uptime inside the namespace. Mount namespace allows us to have a different procfs and whatever mounts does the other init system.

Let's try it. The examples/systemd.yml.disabled service starts /sbin/init using run-on-pid-ns. It's disabled by default, you may want to disable all other services (at least agetty to avoid clashes).

We're now on Void so let's try runit first.

Here we can see the full process hierarchy and /sbin/init is not PID 1. That's because htop uses /proc which didn't get remounted so we see the full hierarchy (but cannot kill them because the PID is not accessible heh!).

So if we mount procfs again, we get processes on the current namespace and htop (and others) work properly.

Another caveat is the reboot/poweroff logic. We cannot do it from the init system as it won't be able to do the reboot(2) syscall (blocked because it's working on a pid namespace). We can anyway do slc reboot or slc poweroff.

Let's try systemd now. systemd is more complex and makes use of a lot of things, but works surprisingly well.
I'm using Arch Linux to test as it uses systemd by default.

As you can see, it works but we can see all processes until we mount procfs.

This init is isolated on a mount namespace so, if we launch htop from inside systemd and outside, we see the effect.

Outside of the namespace

Inside of the namespace

The /run dir is not remounted, so the socket exposed by service-launcher is there and we can connect to it.

What's next

We're missing some useful features systemd has (ok, we're missing a lot).

Templates: so useful for things like agetty@<tty> or dhclient@<interface>. It's queued but not planned.
Follow forking processes. Some processes do fork(2) and we can't follow them. We're currently working here, stay tuned on the 4th article to follow the process.
Better handle of mounts. Also, working on this, but w/o priority.
Set machine hostname at start from /etc/hostname.
Document commands on the socket.
You may have missed the reload command (something like daemon-reload on systemd. It's not currently implemented, but its planned.
Fix race conditions on service-launcher. More likely, the process state. This may be looked with the reload feature.

There are some other features on systemd that we're not going to support like:

Confining all processes on cgroups
Derived from previous point, do firewalling on cgroups
Securing folders against those processes (RO home directory or private tmp).

Next article will address "How to follow slippery processes". Stay tuned for more.

DEV Community

Writing an init with Go (part 3, running it!)

Compiling

Void w/ go on PID 1

The control socket

Run init systems for fun and... fun

What's next

Top comments (0)

Read next

Write tools for LLMs with go - mcp-golang

Override Go app configuration with Environment variable

🐧 Linux: The Open-Source Powerhouse 🚀

Mastering Linux Process Management Like a Pro 🚀