Understanding PID Namespaces: The Small Linux Feature Behind Container Process Isolation
When people first learn containers, they usually hear this sentence:
“A container is just a process.”
That sentence is true, but incomplete.
A better version is:
“A container is a regular Linux process running with a different view of the system.”
One of the most important parts of that different view is the PID namespace.
A PID namespace controls what processes a process can see and what process IDs look like from inside that environment. It is one of the Linux kernel features that makes containers feel isolated, even though everything is still running on the same host kernel.
Docker, containerd, runc, Kubernetes, and even small learning projects like a tiny Docker-like runtime all rely on this idea.
What problem does a PID namespace solve?
On a normal Linux machine, every process has a PID:
ps aux
You may see things like:
PID 1 systemd
PID 842 sshd
PID 1201 nginx
PID 2300 node
Without PID isolation, a process inside a container could see host processes. That would be noisy, confusing, and dangerous.
With a PID namespace, the container gets its own process ID view.
Inside the container:
PID 1 app
PID 7 worker
PID 12 shell
On the host, those same processes still have real host PIDs:
PID 34520 app
PID 34541 worker
PID 34610 shell
So the same process can have two identities:
- one PID inside the container
- another PID on the host
This is not magic. It is namespace-based translation done by the Linux kernel.
PID 1 is not just “the first process”
A very common beginner mistake is thinking PID 1 is only a number.
It is not.
Inside a PID namespace, the first process becomes PID 1, and PID 1 has special responsibilities.
In a normal Linux system, PID 1 is usually systemd or another init system. In a container, PID 1 might be your application:
docker run my-api
If your app becomes PID 1 directly, it now behaves like the init process of that namespace.
That matters because PID 1 is responsible for handling orphaned child processes and reaping zombies. The Linux man pages describe the first process in a new PID namespace as the namespace init process, and orphaned children in that namespace are reparented to it.
This is why senior engineers often care about tiny init processes like:
tini
dumb-init
Without a proper init process, long-running containers can slowly accumulate zombie processes.
A container may look healthy from the outside, but inside it can be leaking process table entries because PID 1 is not doing its job.
The senior-level lesson: containers are isolation, not virtualization
A VM gets its own kernel.
A container does not.
A container shares the host kernel, but gets isolated views using kernel features like:
- PID namespaces
- mount namespaces
- network namespaces
- UTS namespaces
- IPC namespaces
- user namespaces
- cgroups
The PID namespace only isolates process visibility and PID numbering. It does not magically secure everything.
That is a critical mental model.
A PID namespace can stop a container from seeing host processes, but it does not protect you from:
- dangerous Linux capabilities
- privileged containers
- host filesystem mounts
- exposed Docker socket
- weak seccomp, AppArmor, or SELinux profiles
- kernel vulnerabilities
- bad Kubernetes security context settings
This is why container security is usually about layers, not one feature.
How Docker uses PID namespaces
By default, Docker gives containers their own PID namespace.
Docker exposes this through the --pid option. The default mode isolates processes, while --pid=host makes the container use the host PID namespace.
Example:
docker run --rm -it ubuntu ps aux
Inside the container, you may see only a few processes.
But with host PID mode:
docker run --rm -it --pid=host ubuntu ps aux
The container can see host processes.
That flag is useful for debugging, monitoring, and observability tools, but it should be treated carefully. In production, --pid=host removes an important isolation boundary.
What is the “hash” inside /proc/<pid>/ns/pid?
When you inspect namespaces, you may see something like this:
readlink /proc/$$/ns/pid
Output:
pid:[4026531836]
People sometimes casually call this a “namespace hash”, but it is not a cryptographic hash.
It is a kernel namespace identifier exposed through procfs. Namespace references are shown as special symbolic links, and the number helps identify whether two processes are in the same namespace.
If two processes show the same namespace ID for pid, they share the same PID namespace.
Example:
readlink /proc/1/ns/pid
readlink /proc/$$/ns/pid
If both return the same value, both processes are in the same PID namespace.
This is very useful for debugging containers.
How to check PID namespace isolation
From inside a container:
ps aux
If you only see the container’s own processes, PID isolation is probably enabled.
Check the namespace ID:
readlink /proc/1/ns/pid
readlink /proc/$$/ns/pid
From the host, inspect a container process:
docker inspect --format '{{.State.Pid}}' <container_id>
Then:
readlink /proc/<host_pid>/ns/pid
You can compare namespace IDs between host processes and container processes.
Another useful command:
lsns -t pid
This shows PID namespaces on the system.
For deeper debugging:
pstree -p
or:
ps -eo pid,ppid,cmd
The trick is to always remember that the host sees the full truth, while the container sees a translated view.
How PID namespace isolation can be weakened
This is where many real-world mistakes happen.
PID namespaces are not usually “bypassed” by magic. They are usually weakened by configuration choices.
Here are common examples.
1. Running with host PID namespace
--pid=host
This makes the container see host processes.
Sometimes this is used by monitoring tools, but it should not be the default for normal application containers.
2. Running privileged containers
--privileged
A privileged container receives broad access that removes many normal container restrictions.
This is sometimes convenient during development, but it should be avoided for normal production workloads.
3. Mounting sensitive host paths
Examples:
-v /proc:/host/proc
-v /:/host
-v /var/run/docker.sock:/var/run/docker.sock
Mounting the Docker socket is especially dangerous because it can effectively give control over the Docker daemon.
4. Adding dangerous capabilities
Capabilities such as these should be reviewed carefully:
SYS_ADMIN
SYS_PTRACE
NET_ADMIN
DAC_READ_SEARCH
For PID and process security, SYS_PTRACE is especially sensitive because it relates to inspecting and tracing processes.
5. Weak Kubernetes security context
In Kubernetes, settings like these are important:
hostPID: true
privileged: true
allowPrivilegeEscalation: true
For normal workloads, these should usually be avoided.
Defensive checklist for real projects
When reviewing a containerized service, I usually ask these questions.
Runtime
docker inspect <container_id> | grep -i pid
Check whether the container is using host PID mode.
Capabilities
docker inspect <container_id> | grep -i cap
Prefer dropping unnecessary capabilities:
--cap-drop=ALL
Then add back only what is truly required.
Privileged mode
docker inspect <container_id> | grep -i privileged
For most application containers, this should be false.
Process tree
docker exec -it <container_id> ps aux
Look for zombie processes:
ps aux | grep Z
If you see zombies, check whether PID 1 is properly reaping children.
Namespace comparison
readlink /proc/1/ns/pid
readlink /proc/$$/ns/pid
Compare host and container namespace IDs.
Kubernetes
Check pod specs for:
hostPID: true
securityContext:
privileged: true
allowPrivilegeEscalation: true
These settings should be intentional, documented, and reviewed.
A practical example from building a tiny container runtime
When building a minimal Docker-like runtime, PID namespace support usually starts with something like:
SysProcAttr: &syscall.SysProcAttr{
Cloneflags: syscall.CLONE_NEWPID,
}
But there is a subtle detail.
When you create a new PID namespace, the child process becomes PID 1 inside that namespace. The parent still lives in the old namespace.
That means your runtime has to think carefully about:
- who becomes PID 1
- whether PID 1 launches the user command directly
- whether you need a small init process
- how signals are forwarded
- how child processes are reaped
- what happens when PID 1 exits
This is where the learning becomes real.
Creating a namespace is easy.
Managing a namespace correctly is the hard part.
Senior engineering lessons
1. Do not confuse isolation with security
PID namespaces provide process isolation, but they are only one part of the security model.
2. PID 1 behavior matters
If your application runs as PID 1, signal handling and zombie reaping become your problem.
3. Debugging containers requires two views
Always check both:
- inside the container
- from the host
The same process has different PIDs depending on where you look from.
4. Most “container escapes” start with bad configuration
In real systems, the issue is often not the PID namespace itself. The issue is combining weak settings:
- privileged mode
- host PID
- host mounts
- excessive capabilities
- exposed Docker socket
5. Use namespaces intentionally
For observability tools, hostPID or --pid=host may be required.
For normal application workloads, it is usually unnecessary risk.
References
- Linux man-pages: PID namespaces
- Linux Kernel Documentation: Namespaces
- Docker documentation:
docker run --pid - OWASP Docker Security Cheat Sheet
Final thought
PID namespaces are one of those Linux features that look simple at first:
“The container gets its own process IDs.”
But after working with real systems, you realize the deeper lesson:
Process isolation is not only about hiding PIDs. It is about controlling visibility, lifecycle, signals, debugging, and failure boundaries.
That is why PID namespaces are not just a container feature.
They are a production engineering concept.
If you understand PID namespaces well, Docker feels less like magic and more like a thin layer over powerful Linux primitives.
Top comments (0)