Tanay

Posted on Jun 2

Linux Namespaces

#docker #containers #linux #cloud

Prerequisite

Linux primitives

A namespace gives a process its own isolated view of a kernel resource. Without namespaces, every process on the system sees the same process list, network interfaces, hostname, mount points, and more. Namespaces allow Linux to create the illusion that a process is running on its own machine.

This is one of the main reasons containers feel like lightweight virtual machines even though they all share the same kernel.

The Big Picture

By default, Linux resources are global.

A process can see:

All system processes
All network interfaces
The host hostname
The host filesystem mounts
Shared IPC resources

Namespaces wrap these resources and give a process a private view of them.

For example, a container process may see:

Only container processes
Its own network interfaces
Its own hostname
Its own root filesystem

Even though everything is running on the same kernel.

Linux currently provides seven namespace types:

Namespace	Purpose
PID	Process isolation
NET	Network isolation
MNT	Filesystem mount isolation
UTS	Hostname isolation
IPC	Message queue and shared memory isolation
USER	UID/GID mapping isolation
CGROUP	cgroup visibility isolation

Each namespace isolates a different kernel resource.

The Three Syscalls Behind Everything

Almost every namespace operation ultimately comes down to these three syscalls:

clone()
unshare()
setns()

Understanding these three explains how Docker, containerd, CRI-O, CNI plugins, kubectl exec, and nsenter actually work.

clone()

clone() creates a new process.

When namespace flags such as CLONE_NEWNET or CLONE_NEWPID are supplied, the child process is created inside brand-new namespaces.

Conceptually:

Parent Process
       |
clone(CLONE_NEWNET)
       |
Child Process

The child now has its own network namespace.

Container runtimes use this mechanism when starting containers. They call clone() with multiple namespace flags and then execute the container's entrypoint.

The new process wakes up in an isolated environment with its own process IDs, network stack, mount points, hostname, and more.

This is essentially how docker run begins.

unshare()

Unlike clone(), unshare() does not create a child process.

Instead, it detaches the current process from one or more existing namespaces and creates new namespaces for it.

For example:

sudo unshare --uts bash

This starts a shell in a new UTS namespace.

Inside that shell:

hostname container-test

Changing the hostname only affects that namespace.

The host machine remains unchanged.

Think of unshare() as:

Keep the same process
Create new namespaces
Move process into them

It is extremely useful for learning, testing, and debugging namespace behavior.

setns()

This is probably the most important syscall for container networking and Kubernetes internals.

setns() allows a process to join an existing namespace.

Instead of creating a new namespace, it enters one that already exists.

Conceptually:

Current Process
        |
      setns()
        |
Existing Namespace

This is how many container tools work.

Examples:

kubectl exec
nsenter
CNI plugins

When you run:

kubectl exec -it pod-name -- bash

Kubernetes does not magically start your shell into the container.

Instead, the runtime locates the container namespaces and uses setns() to join them before launching bash.

The shell now sees exactly what the container sees.

CNI plugins use the same mechanism.

They enter the container's network namespace using setns(), configure networking, assign IP addresses, and create routes.

nsenter is essentially a convenient wrapper around setns().

For example:

sudo nsenter -t 1234 --net --pid --mount -- bash

This joins the network, PID, and mount namespaces of process 1234 and launches a shell inside them.

Namespaces Are Files

One of the most surprising aspects of Linux namespaces is that they are exposed through the filesystem.

Every process has namespace references located under:

/proc/<pid>/ns/

For example:

ls -la /proc/$$/ns/

You may see something similar to:

net -> net:[4026531992]
pid -> pid:[4026531836]
mnt -> mnt:[4026531840]

These are not normal files.

They are references to namespace objects maintained by the kernel.

The number inside the brackets is the namespace inode.

If two processes have the same namespace inode, they are in the same namespace.

This is exactly how tools determine whether two processes share a network namespace or a PID namespace.

Keeping a Namespace Alive

Normally, when the last process inside a namespace exits, the namespace disappears.

However, a namespace remains alive as long as something still holds a reference to it.

This leads to an interesting thing.

You can bind mount a namespace file to keep it alive even when no processes exist inside it.

Example:

sudo touch /run/netns/demo
sudo mount --bind /proc/$$/ns/net /run/netns/demo

Now the network namespace continues to exist even after the current shell exits.

This is exactly how ip netns add works.

It creates a namespace and stores a reference under:

/run/netns/

The namespace survives because the bind mount keeps a reference to the kernel namespace object.

Container runtimes use the same idea.

They create the network namespace, store a reference to it, and pass the path to the CNI plugin.

The plugin then:

Opens the namespace file
Calls setns()
Configures networking inside that namespace

This is how container networking is wired up behind the scenes.

Why This Matters

Namespaces are the primary isolation mechanism used by containers.

When people say:

Containers are isolated from the host

what they are actually saying is:

PID namespaces isolate processes
Network namespaces isolate networking
Mount namespaces isolate filesystems
UTS namespaces isolate hostnames
User namespaces isolate privileges

Containers are not magic.

Most of the isolation comes from namespaces.

Everything from Docker to Kubernetes are based on these kernel primitives.

DEV Community

Linux Namespaces

The Big Picture

The Three Syscalls Behind Everything

clone()

unshare()

setns()

Namespaces Are Files

Keeping a Namespace Alive

Why This Matters

Top comments (0)