DEV Community

Cover image for Linux Namespaces
Tanay
Tanay

Posted on

Linux Namespaces

Prerequisite

A namespace gives a process its own isolated view of a kernel resource. Without namespaces, every process on the system sees the same process list, network interfaces, hostname, mount points, and more. Namespaces allow Linux to create the illusion that a process is running on its own machine.

This is one of the main reasons containers feel like lightweight virtual machines even though they all share the same kernel.

The Big Picture

By default, Linux resources are global.

A process can see:

  • All system processes
  • All network interfaces
  • The host hostname
  • The host filesystem mounts
  • Shared IPC resources

Namespaces wrap these resources and give a process a private view of them.

For example, a container process may see:

  • Only container processes
  • Its own network interfaces
  • Its own hostname
  • Its own root filesystem

Even though everything is running on the same kernel.

Linux currently provides seven namespace types:

Namespace Purpose
PID Process isolation
NET Network isolation
MNT Filesystem mount isolation
UTS Hostname isolation
IPC Message queue and shared memory isolation
USER UID/GID mapping isolation
CGROUP cgroup visibility isolation

Each namespace isolates a different kernel resource.

The Three Syscalls Behind Everything

Almost every namespace operation ultimately comes down to these three syscalls:

  • clone()
  • unshare()
  • setns()

Understanding these three explains how Docker, containerd, CRI-O, CNI plugins, kubectl exec, and nsenter actually work.

clone()

clone() creates a new process.

When namespace flags such as CLONE_NEWNET or CLONE_NEWPID are supplied, the child process is created inside brand-new namespaces.

Conceptually:

Parent Process
       |
clone(CLONE_NEWNET)
       |
Child Process
Enter fullscreen mode Exit fullscreen mode

The child now has its own network namespace.

Container runtimes use this mechanism when starting containers. They call clone() with multiple namespace flags and then execute the container's entrypoint.

The new process wakes up in an isolated environment with its own process IDs, network stack, mount points, hostname, and more.

This is essentially how docker run begins.

unshare()

Unlike clone(), unshare() does not create a child process.

Instead, it detaches the current process from one or more existing namespaces and creates new namespaces for it.

For example:

sudo unshare --uts bash
Enter fullscreen mode Exit fullscreen mode

This starts a shell in a new UTS namespace.

Inside that shell:

hostname container-test
Enter fullscreen mode Exit fullscreen mode

Changing the hostname only affects that namespace.

The host machine remains unchanged.

Think of unshare() as:

Keep the same process
Create new namespaces
Move process into them
Enter fullscreen mode Exit fullscreen mode

It is extremely useful for learning, testing, and debugging namespace behavior.

setns()

This is probably the most important syscall for container networking and Kubernetes internals.

setns() allows a process to join an existing namespace.

Instead of creating a new namespace, it enters one that already exists.

Conceptually:

Current Process
        |
      setns()
        |
Existing Namespace
Enter fullscreen mode Exit fullscreen mode

This is how many container tools work.

Examples:

  • kubectl exec
  • nsenter
  • CNI plugins

When you run:

kubectl exec -it pod-name -- bash
Enter fullscreen mode Exit fullscreen mode

Kubernetes does not magically start your shell into the container.

Instead, the runtime locates the container namespaces and uses setns() to join them before launching bash.

The shell now sees exactly what the container sees.

CNI plugins use the same mechanism.

They enter the container's network namespace using setns(), configure networking, assign IP addresses, and create routes.

nsenter is essentially a convenient wrapper around setns().

For example:

sudo nsenter -t 1234 --net --pid --mount -- bash
Enter fullscreen mode Exit fullscreen mode

This joins the network, PID, and mount namespaces of process 1234 and launches a shell inside them.

Namespaces Are Files

One of the most surprising aspects of Linux namespaces is that they are exposed through the filesystem.

Every process has namespace references located under:

/proc/<pid>/ns/
Enter fullscreen mode Exit fullscreen mode

For example:

ls -la /proc/$$/ns/
Enter fullscreen mode Exit fullscreen mode

You may see something similar to:

net -> net:[4026531992]
pid -> pid:[4026531836]
mnt -> mnt:[4026531840]
Enter fullscreen mode Exit fullscreen mode

These are not normal files.

They are references to namespace objects maintained by the kernel.

The number inside the brackets is the namespace inode.

If two processes have the same namespace inode, they are in the same namespace.

This is exactly how tools determine whether two processes share a network namespace or a PID namespace.

Keeping a Namespace Alive

Normally, when the last process inside a namespace exits, the namespace disappears.

However, a namespace remains alive as long as something still holds a reference to it.

This leads to an interesting thing.

You can bind mount a namespace file to keep it alive even when no processes exist inside it.

Example:

sudo touch /run/netns/demo
sudo mount --bind /proc/$$/ns/net /run/netns/demo
Enter fullscreen mode Exit fullscreen mode

Now the network namespace continues to exist even after the current shell exits.

This is exactly how ip netns add works.

It creates a namespace and stores a reference under:

/run/netns/
Enter fullscreen mode Exit fullscreen mode

The namespace survives because the bind mount keeps a reference to the kernel namespace object.

Container runtimes use the same idea.

They create the network namespace, store a reference to it, and pass the path to the CNI plugin.

The plugin then:

  1. Opens the namespace file
  2. Calls setns()
  3. Configures networking inside that namespace

This is how container networking is wired up behind the scenes.

Why This Matters

Namespaces are the primary isolation mechanism used by containers.

When people say:

Containers are isolated from the host

what they are actually saying is:

  • PID namespaces isolate processes
  • Network namespaces isolate networking
  • Mount namespaces isolate filesystems
  • UTS namespaces isolate hostnames
  • User namespaces isolate privileges

Containers are not magic.

Most of the isolation comes from namespaces.

Everything from Docker to Kubernetes are based on these kernel primitives.

Top comments (0)