Shubham Nainwal

Posted on May 17

Build Your Own Container Runtime in Go: From Zero to a Running Isolated Process

#containers #docker #go #cli

I built gocount as a way to actually understand what Docker does under the hood. By the end of this post you'll have a working container runtime that boots an Alpine Linux shell in its own filesystem, PID tree, hostname, and network, with enforced memory and CPU limits, using nothing but Go and Linux kernel features.

Why Build a Container Runtime?

Containers feel like magic until you look at what's actually happening. When docker run executes, the kernel doesn't spin up a virtual machine. It just creates a process with a restricted view of the system, shaped by five or six kernel primitives that have been in Linux since 2008.

Every production container runtime (Docker, Podman, containerd) is ultimately a thin orchestration layer on top of those primitives. Building one yourself is the fastest way to understand:

What a container actually is (spoiler: it's a process)
How pivot_root replaces the filesystem you see
How cgroups enforce the memory limit that OOM-kills your app
How the veth pair wires the container to the outside network

Prerequisites

Requirement	Why
Linux (kernel 5.10+ recommended)	All the kernel features used here are Linux-specific
Go 1.23+	Module system + `syscall` / `golang.org/x/sys`
`iproute2` (`ip` command)	Creating and managing network interfaces
`iptables`	NAT masquerading for container internet access
Root access	Namespaces, cgroup creation, and `pivot_root` require it
`nsenter`	Moving a network interface into a namespace

Check your kernel supports cgroup v2 (required for memory + CPU limits):

cat /sys/fs/cgroup/cgroup.controllers
# Should include: cpuset cpu io memory hugetlb pids rdma

Project Layout

gocount/
├── main.go
├── go.mod
├── Makefile
├── cmd/
│   ├── root.go       # cobra root command
│   ├── run.go        # run & start subcommands + child setup
│   ├── ps.go         # list containers
│   ├── stop.go       # stop & rm commands
│   └── inspect.go    # inspect a container
└── internal/
    ├── container/
    │   ├── container.go   # Container struct, in-memory map, JSON persistence
    │   ├── mount.go       # pivot_root, /proc /sys /dev mounts, DNS
    │   └── utils.go       # EnsureContainerDir
    ├── cgroups/
    │   └── cgroups.go     # cgroup v2 create / limits / delete
    ├── rootfs/
    │   └── manager.go     # download & extract Alpine Linux rootfs
    └── network/
        └── network.go     # veth pair, IP forwarding, NAT, in-container setup

Initialise it:

mkdir gocount && cd gocount
go mod init gocount
go get github.com/spf13/cobra@v1.10.1
go get golang.org/x/sys@v0.29.0

Concept 1: Linux Namespaces

A namespace wraps a global resource and gives a process the illusion that it has its own isolated instance of that resource. gocount uses four:

Flag	What it isolates
`CLONE_NEWPID`	PID space — the container's first process is PID 1
`CLONE_NEWUTS`	Hostname and domain name
`CLONE_NEWNS`	Mount tree — filesystem changes are invisible to the host
`CLONE_NEWNET`	Network stack — the container gets its own `lo`, no host interfaces

When you combine all four, the process is isolated enough to look and feel like a separate machine, even though it's sharing the same kernel as the host.

Concept 2: cgroup v2

cgroup v2 is the kernel's resource accounting and limiting subsystem. It's a virtual filesystem mounted at /sys/fs/cgroup. You control a process by:

Creating a subdirectory: mkdir /sys/fs/cgroup/gocount/<id>
Writing the limit: echo 104857600 > /sys/fs/cgroup/gocount/<id>/memory.max
Writing the PID: echo <pid> > /sys/fs/cgroup/gocount/<id>/cgroup.procs

Once the PID is in the cgroup, the kernel enforces the limit. Exceed it and the OOM killer fires.

Concept 3: pivot_root

chroot changes where a process looks for /, but pivot_root is stronger. It actually swaps the root mount point and lets you unmount the old one entirely, so the process can't escape back to the host filesystem.

The steps:

Bind-mount the new rootfs onto itself (required by the kernel)
Create a /.pivot_root directory inside the new rootfs
Call pivot_root(newroot, newroot/.pivot_root)
chdir("/") to land inside the new root
Unmount /.pivot_root with MNT_DETACH
Remove /.pivot_root

After step 6, the host filesystem is completely gone from the process's perspective.

Concept 4: veth pairs

A veth pair is a virtual Ethernet cable with two ends. When you move one end into the container's network namespace, you get a private point-to-point link between host and container. gocount then:

Assigns 10.0.0.1/24 to the host end
Assigns 10.0.0.2/24 to the container end
Adds a default route through 10.0.0.1
Configures iptables NAT (masquerade) so the container can reach the internet

Step 1: Entry Point and CLI

main.go
cmd/root.go

main.go is two lines:

// main.go
package main

import "gocount/cmd"

func main() {
    cmd.Execute()
}

cmd/root.go registers the root cobra command:

// cmd/root.go
package cmd

import (
    "fmt"
    "os"
    "github.com/spf13/cobra"
)

var rootCmd = &cobra.Command{
    Use:   "gocount",
    Short: "gocount is a minimal container runtime",
    Long:  `Run Linux processes in isolated namespaces, like a tiny Docker.`,
}

func Execute() {
    if err := rootCmd.Execute(); err != nil {
        fmt.Println(err)
        os.Exit(1)
    }
}

Step 2: The Container Struct and Persistence

Every running container is represented by this struct and stored in two places: an in-memory map for fast lookup, and a JSON file under /tmp/gocount/<id>.json so it survives process restarts.

// internal/container/container.go
package container

import (
    "encoding/json"
    "fmt"
    "math/rand"
    "os"
    "strings"
)

type Container struct {
    ID      string
    Pid     int
    Command []string
    Status  string
    RootFs  string
    Cgroup  string
}

var Containers = map[string]*Container{}

func GenerateID() string {
    letters := "abcdefghijklmnopqrstuvwxyz0123456789"
    id := ""
    for i := 0; i < 8; i++ {
        id += string(letters[rand.Intn(len(letters))])
    }
    return id
}

func SaveContainer(c *Container) error {
    data, _ := json.Marshal(c)
    path := fmt.Sprintf("/tmp/gocount/%s.json", c.ID)
    return os.WriteFile(path, data, 0644)
}

func LoadContainers() ([]*Container, error) {
    var containers []*Container
    files, _ := os.ReadDir("/tmp/gocount")
    for _, f := range files {
        if f.IsDir() || !strings.HasSuffix(f.Name(), ".json") {
            continue
        }
        data, err := os.ReadFile("/tmp/gocount/" + f.Name())
        if err != nil {
            fmt.Fprintf(os.Stderr, "Warning: could not read %s: %v\n", f.Name(), err)
            continue
        }
        var c Container
        if err := json.Unmarshal(data, &c); err != nil {
            fmt.Fprintf(os.Stderr, "Warning: could not parse %s: %v\n", f.Name(), err)
            continue
        }
        containers = append(containers, &c)
    }
    return containers, nil
}

Why two stores? The in-memory map is O(1) with no I/O overhead for commands that run in the same process lifetime as run. The JSON files persist state across invocations, so ps, stop, and inspect work even after the parent process exits.

Step 3: Downloading the Rootfs

A container needs a root filesystem. gocount downloads Alpine Linux minirootfs on first run. It's only ~3 MB compressed and contains a full /bin/sh, ip, ping, python3, and apk.

// internal/rootfs/manager.go
const DefaultRootfsURL = "https://dl-cdn.alpinelinux.org/alpine/v3.19/releases/x86_64/alpine-minirootfs-3.19.1-x86_64.tar.gz"

func EnsureRootfs(rootfsDir string) error {
    if isValidRootfs(rootfsDir) {
        return nil
    }
    fmt.Println("Rootfs not found. Downloading Alpine Linux rootfs...")
    return DownloadAndExtractRootfs(DefaultRootfsURL, rootfsDir)
}

isValidRootfs checks that bin/, lib/, etc/, usr/, and bin/sh all exist. If any are missing the tarball is re-downloaded.

Security note: zip-slip protection. When extracting the tarball every header path is validated to stay inside the destination directory:

target := filepath.Join(destPath, header.Name)
if !strings.HasPrefix(
    filepath.Clean(target)+string(os.PathSeparator),
    filepath.Clean(destPath)+string(os.PathSeparator),
) {
    return fmt.Errorf("illegal path in tar archive: %s", header.Name)
}

A crafted tar with ../../etc/passwd entries would otherwise silently overwrite host files.

The rootfs is stored at /tmp/gocount/<id>/rootfs and reused across runs sharing the same ID. On the very first gocount run you'll see:

Rootfs not found. Downloading Alpine Linux rootfs...
Downloading from https://dl-cdn.alpinelinux.org/alpine/...
Extracting rootfs...
Rootfs setup complete!

Subsequent runs are instant because isValidRootfs passes immediately.

Step 4: cgroup v2 Resource Limits

The cgroup code lives entirely in internal/cgroups/cgroups.go. The interface is simple:

// Create a cgroup for a container
cgPath, err := cgroups.Create(id)

// Apply limits (empty string = skip)
cgroups.SetMemoryLimit(cgPath, "100M")      // supports M and G suffixes
cgroups.SetCPUQuota(cgPath, "50000 100000") // quota/period in microseconds

// Add a process to the cgroup
cgroups.AddProc(cgPath, pid)

How memory limits are applied:

func SetMemoryLimit(cgPath, limit string) error {
    if limit == "" {
        return nil
    }
    val := limit
    if strings.HasSuffix(limit, "M") {
        mb, _ := strconv.ParseInt(strings.TrimSuffix(limit, "M"), 10, 64)
        val = strconv.FormatInt(mb*1024*1024, 10)
    } else if strings.HasSuffix(limit, "G") {
        gb, _ := strconv.ParseInt(strings.TrimSuffix(limit, "G"), 10, 64)
        val = strconv.FormatInt(gb*1024*1024*1024, 10)
    }
    writeFile(filepath.Join(cgPath, "memory.max"), val)
    writeFile(filepath.Join(cgPath, "memory.swap.max"), "0") // disable swap
    writeFile(filepath.Join(cgPath, "memory.oom.group"), "1") // kill whole cgroup on OOM
    return nil
}

CPU quota uses cgroup v2's cpu.max format: <quota_microseconds> <period_microseconds>. For example "50000 100000" means 50 ms out of every 100 ms period, which works out to 50% of one CPU core.

Important: The cgroup is created before the child process starts, so limits are in place the moment the container process writes its PID to cgroup.procs.

Step 5: The `run` Command (Parent Side)

This is the heart of gocount. The run command works by re-executing itself, a classic self-re-exec pattern used by runc and lxc.

gocount run /bin/sh
       │
       ├─ parent: sets up cgroup, starts child with namespaces
       │
       └─ child (GOCOUNT_CHILD=1): pivot_root, mount /proc /sys /dev,
                                   wait for veth, configure eth0, exec /bin/sh

Why self-re-exec? Go's runtime starts multiple threads before main(). clone(CLONE_NEWPID) in a multi-threaded process can leave threads in inconsistent namespaces. Re-executing /proc/self/exe gives us a fresh, single-threaded process that enters the namespace cleanly from the very start.

// cmd/run.go (parent side, simplified)
id := container.GenerateID()
rootdir := "/tmp/gocount/" + id + "/rootfs"

rootfs.EnsureRootfs(rootdir)

cgPath, _ := cgroups.Create(id)
cgroups.SetMemoryLimit(cgPath, flagMemory)
cgroups.SetCPUQuota(cgPath, flagCPU)

command := exec.Command("/proc/self/exe", append([]string{"run"}, args...)...)
command.Env = append(os.Environ(),
    "GOCOUNT_CHILD=1",
    "GOCOUNT_CONTAINER_ID="+id,
    "GOCOUNT_ROOTFS="+rootdir,
)
command.SysProcAttr = &syscall.SysProcAttr{
    Cloneflags: syscall.CLONE_NEWUTS |
                syscall.CLONE_NEWPID |
                syscall.CLONE_NEWNS  |
                syscall.CLONE_NEWNET,
}

command.Start()
network.SetupVethPair(id, command.Process.Pid) // host-side networking

c := &container.Container{
    ID: id, Pid: command.Process.Pid,
    Command: args, Status: "running",
    RootFs: rootdir, Cgroup: cgPath,
}
container.Containers[id] = c
container.SaveContainer(c)

command.Wait()

Step 6: The `run` Command (Child Side)

When the child detects GOCOUNT_CHILD=1 it calls childSetup:

func childSetup(args []string) {
    rootfsPath  := os.Getenv("GOCOUNT_ROOTFS")
    containerID := os.Getenv("GOCOUNT_CONTAINER_ID")

    // Join the pre-created cgroup
    cgPath := filepath.Join("/sys/fs/cgroup", "gocount", containerID)
    cgroups.AddProc(cgPath, os.Getpid())

    // Switch to the isolated filesystem
    container.SetupMount(rootfsPath)

    // Set container hostname
    syscall.Sethostname([]byte("gocount"))

    // Wait up to 5s for the parent to wire the veth pair
    for i := 0; i < 50; i++ {
        if exec.Command("ip", "link", "show", "eth0").Run() == nil {
            break
        }
        time.Sleep(100 * time.Millisecond)
    }

    // Configure eth0 and default route
    network.SetupNetworkInsideContainer()

    // Replace this process with the user's command — it becomes PID 1
    syscall.Exec(args[0], args, os.Environ())
}

The syscall.Exec call is critical. It replaces the Go runtime with the container's process so that the user's command is PID 1 inside the container.

Step 7: Filesystem Isolation with pivot_root

SetupMount is called inside the child, after it enters the new mount namespace:

// internal/container/mount.go
func SetupMount(rootfs string) error {
    rootfs, _ = filepath.Abs(rootfs)

    // Make the current mount tree private so host cannot see our changes
    syscall.Mount("", "/", "", syscall.MS_REC|syscall.MS_PRIVATE, "")

    // Bind mount rootfs onto itself (kernel requirement for pivot_root)
    syscall.Mount(rootfs, rootfs, "", syscall.MS_BIND|syscall.MS_REC, "")

    // Create a directory to receive the old root
    putold := filepath.Join(rootfs, ".pivot_root")
    os.MkdirAll(putold, 0700)

    // Swap the root mount
    syscall.PivotRoot(rootfs, putold)
    os.Chdir("/")

    // Detach and remove the old root
    syscall.Unmount("/.pivot_root", syscall.MNT_DETACH)
    os.RemoveAll("/.pivot_root")

    // Mount essential virtual filesystems
    syscall.Mount("proc",  "/proc", "proc",  0, "")
    syscall.Mount("sysfs", "/sys",  "sysfs", 0, "")
    syscall.Mount("tmpfs", "/dev",  "tmpfs", syscall.MS_NOSUID|syscall.MS_STRICTATIME, "mode=755")

    createDeviceNodes() // /dev/null, /dev/zero, /dev/urandom, etc.
    setupDNS()          // write 8.8.8.8 to /etc/resolv.conf
    return nil
}

After PivotRoot, / is the Alpine rootfs. The host's /home, /etc, /proc — none of it is visible.

Step 8: Networking with veth Pairs

// internal/network/network.go (host side)
func SetupVethPair(containerID string, pid int) error {
    hostIf      := fmt.Sprintf("veth-%s", containerID[:8])  // e.g. veth-a1b2c3d4
    containerIf := fmt.Sprintf("vethc-%s", containerID[:7]) // temporary name

    // Create the pair
    exec.Command("ip", "link", "add", hostIf, "type", "veth", "peer", "name", containerIf).Run()

    // Move one end into the container's network namespace
    exec.Command("ip", "link", "set", containerIf, "netns", fmt.Sprintf("%d", pid)).Run()

    // Rename to eth0 inside the container namespace
    exec.Command("nsenter", "-t", fmt.Sprintf("%d", pid), "-n",
        "ip", "link", "set", containerIf, "name", "eth0").Run()

    // Bring up and address the host end
    exec.Command("ip", "link", "set", hostIf, "up").Run()
    exec.Command("ip", "addr", "add", "10.0.0.1/24", "dev", hostIf).Run()

    EnableIPForwarding()
    SetupNAT() // iptables MASQUERADE for 10.0.0.0/24
    return nil
}

Inside the container, SetupNetworkInsideContainer mirrors this:

exec.Command("ip", "link", "set", "lo",   "up").Run()
exec.Command("ip", "link", "set", "eth0", "up").Run()
exec.Command("ip", "addr", "add", "10.0.0.2/24", "dev", "eth0").Run()
exec.Command("ip", "route", "add", "default", "via", "10.0.0.1").Run()

The container's DNS is 8.8.8.8 (written by setupDNS into /etc/resolv.conf inside the new rootfs).

Step 9: Container Lifecycle Commands

`ps` — list containers

Reads every *.json file from /tmp/gocount/ and prints a table:

CONTAINER ID    PID     STATUS    COMMAND
a1b2c3d4        12345   running   [/bin/sh]

`stop` — send SIGKILL

syscall.Kill(c.Pid, syscall.SIGKILL)
c.Status = "stopped"
container.SaveContainer(c)

`rm` — stop and remove metadata

Kills the process (if still running), removes the JSON file, and deletes it from the in-memory map.

`inspect` — detailed view

Shows PID, status, cgroup resource limits, memory usage, CPU time, and every namespace link:

Container Information:
  ID:        ar9vb2mm
  Status:    running
  PID:       57699
  Command:   [/bin/sh]
  RootFS:    /tmp/gocount/ar9vb2mm/rootfs
  Cgroup:    /sys/fs/cgroup/gocount/ar9vb2mm

Process Status:
  Running:   Yes
  State:     S (sleeping)

Resource Limits:
  Memory Limit:    unlimited
  Memory Usage:    319488 bytes (312.00 KB)
  Memory Peak:     2838528 bytes (2.71 MB)
  CPU Quota:       max/100000 (0.0%)
  CPU Time:        36.04ms

Memory Events:

Processes in Cgroup: 1
  PID: 57699

Namespaces:
  cgroup: cgroup:[4026531835]
  ipc: ipc:[4026531839]
  mnt: mnt:[4026533179]
  net: net:[4026533458]
  pid: pid:[4026533456]
  pid_for_children: pid:[4026533456]
  time: time:[4026531834]
  time_for_children: time:[4026531834]
  user: user:[4026531837]
  uts: uts:[4026533455]

isProcessRunning sends syscall.Signal(0), a no-op signal that returns an error only if the process doesn't exist:

func isProcessRunning(pid int) bool {
    process, err := os.FindProcess(pid)
    if err != nil {
        return false
    }
    return process.Signal(syscall.Signal(0)) == nil
}

Step 10: Build and Run

# Build
make build
# Binary lands at bin/gocount

# Run an interactive Alpine shell
sudo ./bin/gocount run /bin/sh

# Run with resource limits
sudo ./bin/gocount run --memory 100M --cpu "50000 100000" /bin/sh

# In another terminal — list containers
sudo ./bin/gocount ps

# Inspect
sudo ./bin/gocount inspect <id>

# Stop
sudo ./bin/gocount stop <id>

# Remove
sudo ./bin/gocount rm <id>

Testing the memory limit

test.py allocates 1 MB per iteration. With a 50 MB limit you can watch it get OOM-killed:

make test-memory
# Copies test.py into the rootfs, then runs:
# sudo gocount run --memory 50M /usr/bin/python3 /test.py

Output:

Starting memory consumption...
Allocated: 1 MB
Allocated: 2 MB
...
Allocated: 48 MB
Allocated: 49 MB
Killed

The kernel's OOM killer fires the moment the cgroup's memory.max threshold is crossed and terminates the entire cgroup.

How It All Fits Together: End-to-End Walk-through

$ sudo ./bin/gocount run --memory 50M /bin/sh

1. GenerateID()               → "a1b2c3d4"
2. EnsureRootfs(...)          → download Alpine if missing
3. cgroups.Create("a1b2c3d4") → mkdir /sys/fs/cgroup/gocount/a1b2c3d4
4. SetMemoryLimit(..., "50M") → write 52428800 to memory.max
5. exec.Command("/proc/self/exe", "run", "/bin/sh")
   + GOCOUNT_CHILD=1
   + CLONE_NEWUTS | CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWNET
6. command.Start()            → child PID = 12345
7. SetupVethPair("a1b2c3d4", 12345)
   → ip link add veth-a1b2c3d4 type veth peer name vethc-a1b2c3
   → ip link set vethc-a1b2c3 netns 12345
   → nsenter -t 12345 -n ip link set vethc-a1b2c3 name eth0
   → ip addr add 10.0.0.1/24 dev veth-a1b2c3d4
   → iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -j MASQUERADE
8. SaveContainer(...)         → /tmp/gocount/a1b2c3d4.json

--- inside child (GOCOUNT_CHILD=1) ---
9.  AddProc(cgPath, getpid()) → write PID to cgroup.procs
10. SetupMount(rootfsPath)    → pivot_root + mount /proc /sys /dev
11. Sethostname("gocount")
12. poll until eth0 appears   → up to 5 s
13. SetupNetworkInsideContainer()
    → ip link set lo up
    → ip link set eth0 up
    → ip addr add 10.0.0.2/24 dev eth0
    → ip route add default via 10.0.0.1
14. syscall.Exec("/bin/sh", ...) → you are now in the container

Key Bugs Fixed Along the Way

Building a container runtime means touching the kernel directly. A few subtle bugs appeared during development:

Bug	Impact	Fix
Duplicate `init()` in `inspect.go`	Compile error	Remove the second registration
`process.Signal(os.Signal(nil))`	Panic: nil interface type assertion	Use `syscall.Signal(0)`
`removeCmd` missing nil check	Panic when container not found	Guard `c == nil` before access
`AddContainer` called after `Containers[id] = c`	Overwrites `Cgroup` field with zero value	Remove the redundant `AddContainer` call
Zip-slip in `extractTar`	Malicious tar writes outside rootfs	`strings.HasPrefix` boundary check
`startCmd` missing `CLONE_NEWNET`	Container shares host network stack	Add `CLONE_NEWNET` to clone flags
`json.Unmarshal` errors ignored	Corrupted JSON inserts zero-value container	Explicit error check + `continue`
`rand.Seed(time.Now().UnixNano())`	Deprecated in Go 1.20+	Removed: global source is auto-seeded

What's Missing (and How to Add It)

gocount is deliberately minimal. Here's what a production runtime adds on top:

Feature	What to do
User namespace (`CLONE_NEWUSER`)	Map container root to an unprivileged host UID — no more `sudo`
IPC namespace (`CLONE_NEWIPC`)	Isolate System V semaphores and POSIX message queues
Rootfs layers / overlay	Use `overlayfs` so containers share a read-only base and get a private writable layer
Container images	Pull from an OCI registry (skopeo / go-containerregistry)
seccomp filter	Block dangerous syscalls (`ptrace`, `mount`, `reboot`) using `libseccomp`
Capabilities drop	Use `AmbientCaps` + `CapDrop` in `SysProcAttr` to drop `CAP_NET_ADMIN` after setup
Port forwarding	Add iptables `DNAT` rules: host port to container IP:port
Persistent storage	Bind-mount host directories into the container before `pivot_root`
Multi-container networking	Replace individual veth pairs with a Linux bridge (like Docker's `docker0`)

Full Source

The complete source for gocount is structured exactly as shown above. The key files:

cmd/run.go — the parent/child split, namespace flags, cgroup wiring
internal/container/mount.go — pivot_root and essential mounts
internal/cgroups/cgroups.go — cgroup v2 create, limit, add-proc
internal/network/network.go — veth pair, IP forwarding, NAT
internal/rootfs/manager.go — Alpine download and zip-slip-safe extraction

gocount is ~600 lines of Go and it's a real container runtime. The kernel was doing all of this every time you typed docker run. Now you know exactly how.

DEV Community

Build Your Own Container Runtime in Go: From Zero to a Running Isolated Process

Why Build a Container Runtime?

Prerequisites

Project Layout

Concept 1: Linux Namespaces

Concept 2: cgroup v2

Concept 3: pivot_root

Concept 4: veth pairs

Step 1: Entry Point and CLI

Step 2: The Container Struct and Persistence

Step 3: Downloading the Rootfs

Step 4: cgroup v2 Resource Limits

Step 5: The `run` Command (Parent Side)

Step 6: The `run` Command (Child Side)

Step 7: Filesystem Isolation with pivot_root

Step 8: Networking with veth Pairs

Step 9: Container Lifecycle Commands

`ps` — list containers

`stop` — send SIGKILL

`rm` — stop and remove metadata

`inspect` — detailed view

Step 10: Build and Run

Testing the memory limit

How It All Fits Together: End-to-End Walk-through

Key Bugs Fixed Along the Way

What's Missing (and How to Add It)

Full Source

Top comments (0)

Why Build a Container Runtime?

Prerequisites

Project Layout

Concept 1: Linux Namespaces

Concept 2: cgroup v2

Concept 3: pivot_root

Concept 4: veth pairs

Step 1: Entry Point and CLI

Step 2: The Container Struct and Persistence

Step 3: Downloading the Rootfs

Step 4: cgroup v2 Resource Limits

Step 5: The run Command (Parent Side)

Step 6: The run Command (Child Side)

Step 7: Filesystem Isolation with pivot_root

Step 8: Networking with veth Pairs

Step 9: Container Lifecycle Commands

ps — list containers

stop — send SIGKILL

rm — stop and remove metadata

inspect — detailed view

Step 10: Build and Run

Testing the memory limit

How It All Fits Together: End-to-End Walk-through

Key Bugs Fixed Along the Way

What's Missing (and How to Add It)

Full Source

Step 5: The `run` Command (Parent Side)

Step 6: The `run` Command (Child Side)

`ps` — list containers

`stop` — send SIGKILL

`rm` — stop and remove metadata

`inspect` — detailed view