Backend By Dmytro

Posted on Feb 27

Docker Containers Aren't Magic Boxes: Seeing Linux Namespaces in Action

#devops #docker #linux #tutorial

You restart a container with one extra flag—--pid=host—and suddenly top inside it shows every process on the host. Nothing else changed. That's the moment the "container as isolated VM" mental model breaks down, and the real one has to replace it.

Containers aren't virtual machines. They don't have a separate kernel. They're Linux processes whose view of the system — process table, filesystem, users, network interfaces — is scoped by kernel namespaces. Change the namespace configuration, and what the process can see changes. That's the whole story.

The Runtime Chain (and Where Namespaces Attach)

When you run docker run, you're triggering a chain of processes:

dockerd → containerd → containerd-shim → runc → your application

The critical piece is runc. It's short-lived—it configures the namespaces, forks the container process (say, nginx), then exits. The containerd-shim process sticks around to supervise that process and hold onto its file descriptors. After runc finishes, the container is just nginx running under a set of kernel namespace constraints.

The namespaces that runc process configures - such as PID, mount, net, and optionally user, are attached to the application process. They're not a property of Docker. They're a property of the process.

You can inspect them directly:

PID=$(docker inspect -f '{{.State.Pid}}' my-nginx)
sudo ls -l /proc/$PID/ns

That /proc/<pid>/ns listing is your ground truth. Everything else follows from it.

PID Namespace: Process Visibility Is a Configuration Choice

Run nginx normally, exec in, and run top. You'll see only nginx master and workers—nothing from the host. That's the PID namespace working: the process has its own view of the process table, starting from PID 1.

Now restart with --pid=host:

docker run --rm --name=my-nginx -p 8080:80 --pid=host -d nginx:latest
docker exec -ti my-nginx bash
apt update && apt install -f procps
top

top now shows the host's full process list. The container joined the host PID namespace—there's no isolation there anymore.

The debugging implication: if a tool inside a container "can't see" a process, or if it can see more than you expected, check whether --pid=host is set. Don't assume the default.

Mount Namespace: Same Machine, Different Filesystem

Start a default nginx container and compare OS identity inside vs. outside:

# Inside the container
cat /etc/os-release   # → Debian

# On the host
cat /etc/os-release   # → Fedora (or whatever the host runs)

Same machine, different filesystem roots. The container's /etc/nginx/nginx.conf exists in the container's mount view but doesn't show up at /etc/nginx on the host—it lives in Docker's image storage layer and is mounted into the container's isolated namespace.

The practical lesson here is about file path reasoning: paths that exist in the container may not exist at the same location on the host, and vice versa. If you're troubleshooting config files or bind mounts, you need to reason about which mount table you're looking at.

User Namespace: Whether Container Root Is Host Root

By default, Docker does not use user namespaces. The consequence is direct: root in the container is root on the host for any host-mounted paths.

mkdir -p /tmp/userns-demo
docker run --rm --name=my-nginx -p 8080:80 -v /tmp/userns-demo:/demo -d nginx:latest
docker exec -ti my-nginx bash
echo "test" > /demo/container_file.txt
chmod 600 /demo/container_file.txt

On the host:

ls -la /tmp/userns-demo
# → owned by root, mode 600

A non-root user on the host can't touch that file. Root in the container wrote it as host root.

Enabling `userns-remap`

Add this to /etc/docker/daemon.json:

{
    "userns-remap": "default"
}

Restart Docker. Now the container runs under a dockremap user that has a subordinate UID range—typically 100000–165536—assigned in /etc/subuid. Container UID 0 maps to host UID 100000, UID 1 maps to 100001, and so on.

You can verify the mapping by chown-ing the bind-mounted directory on the host and watching how the container perceives it:

sudo chown -R 100000:100000 /tmp/userns-demo
# Container sees: owned by root (UID 0)

sudo chown -R 1001003:1001003 /tmp/userns-demo
# Container sees: owned by UID 1003

The offset is consistent and predictable. What changes is whether a container process that runs as "root" actually has host-root-level access to host-mounted paths—and with userns-remap, it doesn't.

Network Namespace: Interfaces, Isolation, and `--network=host`

By default, the container gets its own network namespace with a limited interface set. The host side has a docker0 bridge; each container gets a veth pair—one end in the host namespace attached to the bridge, the other end in the container's namespace exposed as eth0.

Traffic from the container reaches the internet through that bridge, with iptables handling NAT on the way out.

With --network=host, all of that is bypassed. The container shares the host's network namespace entirely:

docker run --rm --name=my-nginx --network=host -d nginx:latest
# ip addr inside container == ip addr on host

Same output, same interfaces. No veth, no bridge, no NAT. The process is just using the host network stack directly.

Practical Checklist

When something feels "leaky" or unexpectedly isolated, these are the four questions worth asking:

PID visibility off? Check if --pid=host is set; inspect /proc/<pid>/ns/pid on both container and host.
Wrong filesystem path? Verify which mount namespace you're in; don't assume host and container paths agree.
Bind mount permissions wrong? Check whether userns-remap is enabled and what UID mapping applies to the path.
Network interface missing or unexpected? Confirm whether the container uses an isolated network namespace or --network=host.

None of these require special tooling—just /proc, ip addr, and ls -la.

Wrapping Up

A container is a Linux process. The isolation you observe isn't inherent to Docker—it's a set of namespace configurations that runc applies when the container starts. Those configurations are visible, inspectable, and explicitly changeable.

When isolation breaks in unexpected ways, it's almost always because a namespace is being shared (--pid=host, --network=host) or because UID mapping isn't set up the way you assumed. Check the namespaces, then reason from there.

Watch the full video walkthrough (diagrams + demo)

DEV Community

Docker Containers Aren't Magic Boxes: Seeing Linux Namespaces in Action

The Runtime Chain (and Where Namespaces Attach)

PID Namespace: Process Visibility Is a Configuration Choice

Mount Namespace: Same Machine, Different Filesystem

User Namespace: Whether Container Root Is Host Root

Enabling `userns-remap`

Network Namespace: Interfaces, Isolation, and `--network=host`

Practical Checklist

Wrapping Up

Top comments (0)

The Runtime Chain (and Where Namespaces Attach)

PID Namespace: Process Visibility Is a Configuration Choice

Mount Namespace: Same Machine, Different Filesystem

User Namespace: Whether Container Root Is Host Root

Enabling userns-remap

Network Namespace: Interfaces, Isolation, and --network=host

Practical Checklist

Wrapping Up

Enabling `userns-remap`

Network Namespace: Interfaces, Isolation, and `--network=host`