amir

Posted on May 18

Understanding PID Namespaces: The Small Linux Feature Behind Container Process Isolation

#containers #docker #security #linux

Understanding PID Namespaces: The Small Linux Feature Behind Container Process Isolation

When people first learn containers, they usually hear this sentence:

“A container is just a process.”

That sentence is true, but incomplete.

A better version is:

“A container is a regular Linux process running with a different view of the system.”

One of the most important parts of that different view is the PID namespace.

A PID namespace controls what processes a process can see and what process IDs look like from inside that environment. It is one of the Linux kernel features that makes containers feel isolated, even though everything is still running on the same host kernel.

Docker, containerd, runc, Kubernetes, and even small learning projects like a tiny Docker-like runtime all rely on this idea.

What problem does a PID namespace solve?

On a normal Linux machine, every process has a PID:

ps aux

You may see things like:

PID 1      systemd
PID 842    sshd
PID 1201   nginx
PID 2300   node

Without PID isolation, a process inside a container could see host processes. That would be noisy, confusing, and dangerous.

With a PID namespace, the container gets its own process ID view.

Inside the container:

PID 1      app
PID 7      worker
PID 12     shell

On the host, those same processes still have real host PIDs:

PID 34520  app
PID 34541  worker
PID 34610  shell

So the same process can have two identities:

one PID inside the container
another PID on the host

This is not magic. It is namespace-based translation done by the Linux kernel.

PID 1 is not just “the first process”

A very common beginner mistake is thinking PID 1 is only a number.

It is not.

Inside a PID namespace, the first process becomes PID 1, and PID 1 has special responsibilities.

In a normal Linux system, PID 1 is usually systemd or another init system. In a container, PID 1 might be your application:

docker run my-api

If your app becomes PID 1 directly, it now behaves like the init process of that namespace.

That matters because PID 1 is responsible for handling orphaned child processes and reaping zombies. The Linux man pages describe the first process in a new PID namespace as the namespace init process, and orphaned children in that namespace are reparented to it.

This is why senior engineers often care about tiny init processes like:

tini
dumb-init

Without a proper init process, long-running containers can slowly accumulate zombie processes.

A container may look healthy from the outside, but inside it can be leaking process table entries because PID 1 is not doing its job.

The senior-level lesson: containers are isolation, not virtualization

A VM gets its own kernel.

A container does not.

A container shares the host kernel, but gets isolated views using kernel features like:

PID namespaces
mount namespaces
network namespaces
UTS namespaces
IPC namespaces
user namespaces
cgroups

The PID namespace only isolates process visibility and PID numbering. It does not magically secure everything.

That is a critical mental model.

A PID namespace can stop a container from seeing host processes, but it does not protect you from:

dangerous Linux capabilities
privileged containers
host filesystem mounts
exposed Docker socket
weak seccomp, AppArmor, or SELinux profiles
kernel vulnerabilities
bad Kubernetes security context settings

This is why container security is usually about layers, not one feature.

How Docker uses PID namespaces

By default, Docker gives containers their own PID namespace.

Docker exposes this through the --pid option. The default mode isolates processes, while --pid=host makes the container use the host PID namespace.

Example:

docker run --rm -it ubuntu ps aux

Inside the container, you may see only a few processes.

But with host PID mode:

docker run --rm -it --pid=host ubuntu ps aux

The container can see host processes.

That flag is useful for debugging, monitoring, and observability tools, but it should be treated carefully. In production, --pid=host removes an important isolation boundary.

What is the “hash” inside `/proc/<pid>/ns/pid`?

When you inspect namespaces, you may see something like this:

readlink /proc/$$/ns/pid

Output:

pid:[4026531836]

People sometimes casually call this a “namespace hash”, but it is not a cryptographic hash.

It is a kernel namespace identifier exposed through procfs. Namespace references are shown as special symbolic links, and the number helps identify whether two processes are in the same namespace.

If two processes show the same namespace ID for pid, they share the same PID namespace.

Example:

readlink /proc/1/ns/pid
readlink /proc/$$/ns/pid

If both return the same value, both processes are in the same PID namespace.

This is very useful for debugging containers.

How to check PID namespace isolation

From inside a container:

ps aux

If you only see the container’s own processes, PID isolation is probably enabled.

Check the namespace ID:

readlink /proc/1/ns/pid
readlink /proc/$$/ns/pid

From the host, inspect a container process:

docker inspect --format '{{.State.Pid}}' <container_id>

Then:

readlink /proc/<host_pid>/ns/pid

You can compare namespace IDs between host processes and container processes.

Another useful command:

lsns -t pid

This shows PID namespaces on the system.

For deeper debugging:

pstree -p

or:

ps -eo pid,ppid,cmd

The trick is to always remember that the host sees the full truth, while the container sees a translated view.

How PID namespace isolation can be weakened

This is where many real-world mistakes happen.

PID namespaces are not usually “bypassed” by magic. They are usually weakened by configuration choices.

Here are common examples.

1. Running with host PID namespace

--pid=host

This makes the container see host processes.

Sometimes this is used by monitoring tools, but it should not be the default for normal application containers.

2. Running privileged containers

--privileged

A privileged container receives broad access that removes many normal container restrictions.

This is sometimes convenient during development, but it should be avoided for normal production workloads.

3. Mounting sensitive host paths

Examples:

-v /proc:/host/proc
-v /:/host
-v /var/run/docker.sock:/var/run/docker.sock

Mounting the Docker socket is especially dangerous because it can effectively give control over the Docker daemon.

4. Adding dangerous capabilities

Capabilities such as these should be reviewed carefully:

SYS_ADMIN
SYS_PTRACE
NET_ADMIN
DAC_READ_SEARCH

For PID and process security, SYS_PTRACE is especially sensitive because it relates to inspecting and tracing processes.

5. Weak Kubernetes security context

In Kubernetes, settings like these are important:

hostPID: true
privileged: true
allowPrivilegeEscalation: true

For normal workloads, these should usually be avoided.

Defensive checklist for real projects

When reviewing a containerized service, I usually ask these questions.

Runtime

docker inspect <container_id> | grep -i pid

Check whether the container is using host PID mode.

Capabilities

docker inspect <container_id> | grep -i cap

Prefer dropping unnecessary capabilities:

--cap-drop=ALL

Then add back only what is truly required.

Privileged mode

docker inspect <container_id> | grep -i privileged

For most application containers, this should be false.

Process tree

docker exec -it <container_id> ps aux

Look for zombie processes:

ps aux | grep Z

If you see zombies, check whether PID 1 is properly reaping children.

Namespace comparison

readlink /proc/1/ns/pid
readlink /proc/$$/ns/pid

Compare host and container namespace IDs.

Kubernetes

Check pod specs for:

hostPID: true
securityContext:
  privileged: true
  allowPrivilegeEscalation: true

These settings should be intentional, documented, and reviewed.

A practical example from building a tiny container runtime

When building a minimal Docker-like runtime, PID namespace support usually starts with something like:

SysProcAttr: &syscall.SysProcAttr{
    Cloneflags: syscall.CLONE_NEWPID,
}

But there is a subtle detail.

When you create a new PID namespace, the child process becomes PID 1 inside that namespace. The parent still lives in the old namespace.

That means your runtime has to think carefully about:

who becomes PID 1
whether PID 1 launches the user command directly
whether you need a small init process
how signals are forwarded
how child processes are reaped
what happens when PID 1 exits

This is where the learning becomes real.

Creating a namespace is easy.

Managing a namespace correctly is the hard part.

Senior engineering lessons

1. Do not confuse isolation with security

PID namespaces provide process isolation, but they are only one part of the security model.

2. PID 1 behavior matters

If your application runs as PID 1, signal handling and zombie reaping become your problem.

3. Debugging containers requires two views

Always check both:

inside the container
from the host

The same process has different PIDs depending on where you look from.

4. Most “container escapes” start with bad configuration

In real systems, the issue is often not the PID namespace itself. The issue is combining weak settings:

privileged mode
host PID
host mounts
excessive capabilities
exposed Docker socket

5. Use namespaces intentionally

For observability tools, hostPID or --pid=host may be required.

For normal application workloads, it is usually unnecessary risk.

References

Linux man-pages: PID namespaces
Linux Kernel Documentation: Namespaces
Docker documentation: docker run --pid
OWASP Docker Security Cheat Sheet

Final thought

PID namespaces are one of those Linux features that look simple at first:

“The container gets its own process IDs.”

But after working with real systems, you realize the deeper lesson:

Process isolation is not only about hiding PIDs. It is about controlling visibility, lifecycle, signals, debugging, and failure boundaries.

That is why PID namespaces are not just a container feature.

They are a production engineering concept.

If you understand PID namespaces well, Docker feels less like magic and more like a thin layer over powerful Linux primitives.

DEV Community

Understanding PID Namespaces: The Small Linux Feature Behind Container Process Isolation

Understanding PID Namespaces: The Small Linux Feature Behind Container Process Isolation

What problem does a PID namespace solve?

PID 1 is not just “the first process”

The senior-level lesson: containers are isolation, not virtualization

How Docker uses PID namespaces

What is the “hash” inside `/proc/<pid>/ns/pid`?

How to check PID namespace isolation

How PID namespace isolation can be weakened

1. Running with host PID namespace

2. Running privileged containers

3. Mounting sensitive host paths

4. Adding dangerous capabilities

5. Weak Kubernetes security context

Defensive checklist for real projects

Runtime

Capabilities

Privileged mode

Process tree

Namespace comparison

Kubernetes

A practical example from building a tiny container runtime

Senior engineering lessons

1. Do not confuse isolation with security

2. PID 1 behavior matters

3. Debugging containers requires two views

4. Most “container escapes” start with bad configuration

5. Use namespaces intentionally

References

Final thought

Top comments (0)

Understanding PID Namespaces: The Small Linux Feature Behind Container Process Isolation

What problem does a PID namespace solve?

PID 1 is not just “the first process”

The senior-level lesson: containers are isolation, not virtualization

How Docker uses PID namespaces

What is the “hash” inside /proc/<pid>/ns/pid?

How to check PID namespace isolation

How PID namespace isolation can be weakened

1. Running with host PID namespace

2. Running privileged containers

3. Mounting sensitive host paths

4. Adding dangerous capabilities

5. Weak Kubernetes security context

Defensive checklist for real projects

Runtime

Capabilities

Privileged mode

Process tree

Namespace comparison

Kubernetes

A practical example from building a tiny container runtime

Senior engineering lessons

1. Do not confuse isolation with security

2. PID 1 behavior matters

3. Debugging containers requires two views

4. Most “container escapes” start with bad configuration

5. Use namespaces intentionally

References

Final thought

What is the “hash” inside `/proc/<pid>/ns/pid`?