Prerequisite
A namespace gives a process its own isolated view of a kernel resource. Without namespaces, every process on the system sees the same process list, network interfaces, hostname, mount points, and more. Namespaces allow Linux to create the illusion that a process is running on its own machine.
This is one of the main reasons containers feel like lightweight virtual machines even though they all share the same kernel.
The Big Picture
By default, Linux resources are global.
A process can see:
- All system processes
- All network interfaces
- The host hostname
- The host filesystem mounts
- Shared IPC resources
Namespaces wrap these resources and give a process a private view of them.
For example, a container process may see:
- Only container processes
- Its own network interfaces
- Its own hostname
- Its own root filesystem
Even though everything is running on the same kernel.
Linux currently provides seven namespace types:
| Namespace | Purpose |
|---|---|
| PID | Process isolation |
| NET | Network isolation |
| MNT | Filesystem mount isolation |
| UTS | Hostname isolation |
| IPC | Message queue and shared memory isolation |
| USER | UID/GID mapping isolation |
| CGROUP | cgroup visibility isolation |
Each namespace isolates a different kernel resource.
The Three Syscalls Behind Everything
Almost every namespace operation ultimately comes down to these three syscalls:
- clone()
- unshare()
- setns()
Understanding these three explains how Docker, containerd, CRI-O, CNI plugins, kubectl exec, and nsenter actually work.
clone()
clone() creates a new process.
When namespace flags such as CLONE_NEWNET or CLONE_NEWPID are supplied, the child process is created inside brand-new namespaces.
Conceptually:
Parent Process
|
clone(CLONE_NEWNET)
|
Child Process
The child now has its own network namespace.
Container runtimes use this mechanism when starting containers. They call clone() with multiple namespace flags and then execute the container's entrypoint.
The new process wakes up in an isolated environment with its own process IDs, network stack, mount points, hostname, and more.
This is essentially how docker run begins.
unshare()
Unlike clone(), unshare() does not create a child process.
Instead, it detaches the current process from one or more existing namespaces and creates new namespaces for it.
For example:
sudo unshare --uts bash
This starts a shell in a new UTS namespace.
Inside that shell:
hostname container-test
Changing the hostname only affects that namespace.
The host machine remains unchanged.
Think of unshare() as:
Keep the same process
Create new namespaces
Move process into them
It is extremely useful for learning, testing, and debugging namespace behavior.
setns()
This is probably the most important syscall for container networking and Kubernetes internals.
setns() allows a process to join an existing namespace.
Instead of creating a new namespace, it enters one that already exists.
Conceptually:
Current Process
|
setns()
|
Existing Namespace
This is how many container tools work.
Examples:
- kubectl exec
- nsenter
- CNI plugins
When you run:
kubectl exec -it pod-name -- bash
Kubernetes does not magically start your shell into the container.
Instead, the runtime locates the container namespaces and uses setns() to join them before launching bash.
The shell now sees exactly what the container sees.
CNI plugins use the same mechanism.
They enter the container's network namespace using setns(), configure networking, assign IP addresses, and create routes.
nsenter is essentially a convenient wrapper around setns().
For example:
sudo nsenter -t 1234 --net --pid --mount -- bash
This joins the network, PID, and mount namespaces of process 1234 and launches a shell inside them.
Namespaces Are Files
One of the most surprising aspects of Linux namespaces is that they are exposed through the filesystem.
Every process has namespace references located under:
/proc/<pid>/ns/
For example:
ls -la /proc/$$/ns/
You may see something similar to:
net -> net:[4026531992]
pid -> pid:[4026531836]
mnt -> mnt:[4026531840]
These are not normal files.
They are references to namespace objects maintained by the kernel.
The number inside the brackets is the namespace inode.
If two processes have the same namespace inode, they are in the same namespace.
This is exactly how tools determine whether two processes share a network namespace or a PID namespace.
Keeping a Namespace Alive
Normally, when the last process inside a namespace exits, the namespace disappears.
However, a namespace remains alive as long as something still holds a reference to it.
This leads to an interesting thing.
You can bind mount a namespace file to keep it alive even when no processes exist inside it.
Example:
sudo touch /run/netns/demo
sudo mount --bind /proc/$$/ns/net /run/netns/demo
Now the network namespace continues to exist even after the current shell exits.
This is exactly how ip netns add works.
It creates a namespace and stores a reference under:
/run/netns/
The namespace survives because the bind mount keeps a reference to the kernel namespace object.
Container runtimes use the same idea.
They create the network namespace, store a reference to it, and pass the path to the CNI plugin.
The plugin then:
- Opens the namespace file
- Calls setns()
- Configures networking inside that namespace
This is how container networking is wired up behind the scenes.
Why This Matters
Namespaces are the primary isolation mechanism used by containers.
When people say:
Containers are isolated from the host
what they are actually saying is:
- PID namespaces isolate processes
- Network namespaces isolate networking
- Mount namespaces isolate filesystems
- UTS namespaces isolate hostnames
- User namespaces isolate privileges
Containers are not magic.
Most of the isolation comes from namespaces.
Everything from Docker to Kubernetes are based on these kernel primitives.
Top comments (0)