DEV Community

Cover image for Understanding PID Namespaces: The Small Linux Feature Behind Container Process Isolation
amir
amir

Posted on

Understanding PID Namespaces: The Small Linux Feature Behind Container Process Isolation

Understanding PID Namespaces: The Small Linux Feature Behind Container Process Isolation

When people first learn containers, they usually hear this sentence:

“A container is just a process.”

That sentence is true, but incomplete.

A better version is:

“A container is a regular Linux process running with a different view of the system.”

One of the most important parts of that different view is the PID namespace.

A PID namespace controls what processes a process can see and what process IDs look like from inside that environment. It is one of the Linux kernel features that makes containers feel isolated, even though everything is still running on the same host kernel.

Docker, containerd, runc, Kubernetes, and even small learning projects like a tiny Docker-like runtime all rely on this idea.


What problem does a PID namespace solve?

On a normal Linux machine, every process has a PID:

ps aux
Enter fullscreen mode Exit fullscreen mode

You may see things like:

PID 1      systemd
PID 842    sshd
PID 1201   nginx
PID 2300   node
Enter fullscreen mode Exit fullscreen mode

Without PID isolation, a process inside a container could see host processes. That would be noisy, confusing, and dangerous.

With a PID namespace, the container gets its own process ID view.

Inside the container:

PID 1      app
PID 7      worker
PID 12     shell
Enter fullscreen mode Exit fullscreen mode

On the host, those same processes still have real host PIDs:

PID 34520  app
PID 34541  worker
PID 34610  shell
Enter fullscreen mode Exit fullscreen mode

So the same process can have two identities:

  • one PID inside the container
  • another PID on the host

This is not magic. It is namespace-based translation done by the Linux kernel.


PID 1 is not just “the first process”

A very common beginner mistake is thinking PID 1 is only a number.

It is not.

Inside a PID namespace, the first process becomes PID 1, and PID 1 has special responsibilities.

In a normal Linux system, PID 1 is usually systemd or another init system. In a container, PID 1 might be your application:

docker run my-api
Enter fullscreen mode Exit fullscreen mode

If your app becomes PID 1 directly, it now behaves like the init process of that namespace.

That matters because PID 1 is responsible for handling orphaned child processes and reaping zombies. The Linux man pages describe the first process in a new PID namespace as the namespace init process, and orphaned children in that namespace are reparented to it.

This is why senior engineers often care about tiny init processes like:

tini
dumb-init
Enter fullscreen mode Exit fullscreen mode

Without a proper init process, long-running containers can slowly accumulate zombie processes.

A container may look healthy from the outside, but inside it can be leaking process table entries because PID 1 is not doing its job.


The senior-level lesson: containers are isolation, not virtualization

A VM gets its own kernel.

A container does not.

A container shares the host kernel, but gets isolated views using kernel features like:

  • PID namespaces
  • mount namespaces
  • network namespaces
  • UTS namespaces
  • IPC namespaces
  • user namespaces
  • cgroups

The PID namespace only isolates process visibility and PID numbering. It does not magically secure everything.

That is a critical mental model.

A PID namespace can stop a container from seeing host processes, but it does not protect you from:

  • dangerous Linux capabilities
  • privileged containers
  • host filesystem mounts
  • exposed Docker socket
  • weak seccomp, AppArmor, or SELinux profiles
  • kernel vulnerabilities
  • bad Kubernetes security context settings

This is why container security is usually about layers, not one feature.


How Docker uses PID namespaces

By default, Docker gives containers their own PID namespace.

Docker exposes this through the --pid option. The default mode isolates processes, while --pid=host makes the container use the host PID namespace.

Example:

docker run --rm -it ubuntu ps aux
Enter fullscreen mode Exit fullscreen mode

Inside the container, you may see only a few processes.

But with host PID mode:

docker run --rm -it --pid=host ubuntu ps aux
Enter fullscreen mode Exit fullscreen mode

The container can see host processes.

That flag is useful for debugging, monitoring, and observability tools, but it should be treated carefully. In production, --pid=host removes an important isolation boundary.


What is the “hash” inside /proc/<pid>/ns/pid?

When you inspect namespaces, you may see something like this:

readlink /proc/$$/ns/pid
Enter fullscreen mode Exit fullscreen mode

Output:

pid:[4026531836]
Enter fullscreen mode Exit fullscreen mode

People sometimes casually call this a “namespace hash”, but it is not a cryptographic hash.

It is a kernel namespace identifier exposed through procfs. Namespace references are shown as special symbolic links, and the number helps identify whether two processes are in the same namespace.

If two processes show the same namespace ID for pid, they share the same PID namespace.

Example:

readlink /proc/1/ns/pid
readlink /proc/$$/ns/pid
Enter fullscreen mode Exit fullscreen mode

If both return the same value, both processes are in the same PID namespace.

This is very useful for debugging containers.


How to check PID namespace isolation

From inside a container:

ps aux
Enter fullscreen mode Exit fullscreen mode

If you only see the container’s own processes, PID isolation is probably enabled.

Check the namespace ID:

readlink /proc/1/ns/pid
readlink /proc/$$/ns/pid
Enter fullscreen mode Exit fullscreen mode

From the host, inspect a container process:

docker inspect --format '{{.State.Pid}}' <container_id>
Enter fullscreen mode Exit fullscreen mode

Then:

readlink /proc/<host_pid>/ns/pid
Enter fullscreen mode Exit fullscreen mode

You can compare namespace IDs between host processes and container processes.

Another useful command:

lsns -t pid
Enter fullscreen mode Exit fullscreen mode

This shows PID namespaces on the system.

For deeper debugging:

pstree -p
Enter fullscreen mode Exit fullscreen mode

or:

ps -eo pid,ppid,cmd
Enter fullscreen mode Exit fullscreen mode

The trick is to always remember that the host sees the full truth, while the container sees a translated view.


How PID namespace isolation can be weakened

This is where many real-world mistakes happen.

PID namespaces are not usually “bypassed” by magic. They are usually weakened by configuration choices.

Here are common examples.

1. Running with host PID namespace

--pid=host
Enter fullscreen mode Exit fullscreen mode

This makes the container see host processes.

Sometimes this is used by monitoring tools, but it should not be the default for normal application containers.

2. Running privileged containers

--privileged
Enter fullscreen mode Exit fullscreen mode

A privileged container receives broad access that removes many normal container restrictions.

This is sometimes convenient during development, but it should be avoided for normal production workloads.

3. Mounting sensitive host paths

Examples:

-v /proc:/host/proc
-v /:/host
-v /var/run/docker.sock:/var/run/docker.sock
Enter fullscreen mode Exit fullscreen mode

Mounting the Docker socket is especially dangerous because it can effectively give control over the Docker daemon.

4. Adding dangerous capabilities

Capabilities such as these should be reviewed carefully:

SYS_ADMIN
SYS_PTRACE
NET_ADMIN
DAC_READ_SEARCH
Enter fullscreen mode Exit fullscreen mode

For PID and process security, SYS_PTRACE is especially sensitive because it relates to inspecting and tracing processes.

5. Weak Kubernetes security context

In Kubernetes, settings like these are important:

hostPID: true
privileged: true
allowPrivilegeEscalation: true
Enter fullscreen mode Exit fullscreen mode

For normal workloads, these should usually be avoided.


Defensive checklist for real projects

When reviewing a containerized service, I usually ask these questions.

Runtime

docker inspect <container_id> | grep -i pid
Enter fullscreen mode Exit fullscreen mode

Check whether the container is using host PID mode.

Capabilities

docker inspect <container_id> | grep -i cap
Enter fullscreen mode Exit fullscreen mode

Prefer dropping unnecessary capabilities:

--cap-drop=ALL
Enter fullscreen mode Exit fullscreen mode

Then add back only what is truly required.

Privileged mode

docker inspect <container_id> | grep -i privileged
Enter fullscreen mode Exit fullscreen mode

For most application containers, this should be false.

Process tree

docker exec -it <container_id> ps aux
Enter fullscreen mode Exit fullscreen mode

Look for zombie processes:

ps aux | grep Z
Enter fullscreen mode Exit fullscreen mode

If you see zombies, check whether PID 1 is properly reaping children.

Namespace comparison

readlink /proc/1/ns/pid
readlink /proc/$$/ns/pid
Enter fullscreen mode Exit fullscreen mode

Compare host and container namespace IDs.

Kubernetes

Check pod specs for:

hostPID: true
securityContext:
  privileged: true
  allowPrivilegeEscalation: true
Enter fullscreen mode Exit fullscreen mode

These settings should be intentional, documented, and reviewed.


A practical example from building a tiny container runtime

When building a minimal Docker-like runtime, PID namespace support usually starts with something like:

SysProcAttr: &syscall.SysProcAttr{
    Cloneflags: syscall.CLONE_NEWPID,
}
Enter fullscreen mode Exit fullscreen mode

But there is a subtle detail.

When you create a new PID namespace, the child process becomes PID 1 inside that namespace. The parent still lives in the old namespace.

That means your runtime has to think carefully about:

  • who becomes PID 1
  • whether PID 1 launches the user command directly
  • whether you need a small init process
  • how signals are forwarded
  • how child processes are reaped
  • what happens when PID 1 exits

This is where the learning becomes real.

Creating a namespace is easy.

Managing a namespace correctly is the hard part.


Senior engineering lessons

1. Do not confuse isolation with security

PID namespaces provide process isolation, but they are only one part of the security model.

2. PID 1 behavior matters

If your application runs as PID 1, signal handling and zombie reaping become your problem.

3. Debugging containers requires two views

Always check both:

  • inside the container
  • from the host

The same process has different PIDs depending on where you look from.

4. Most “container escapes” start with bad configuration

In real systems, the issue is often not the PID namespace itself. The issue is combining weak settings:

  • privileged mode
  • host PID
  • host mounts
  • excessive capabilities
  • exposed Docker socket

5. Use namespaces intentionally

For observability tools, hostPID or --pid=host may be required.

For normal application workloads, it is usually unnecessary risk.


References

  • Linux man-pages: PID namespaces
  • Linux Kernel Documentation: Namespaces
  • Docker documentation: docker run --pid
  • OWASP Docker Security Cheat Sheet

Final thought

PID namespaces are one of those Linux features that look simple at first:

“The container gets its own process IDs.”

But after working with real systems, you realize the deeper lesson:

Process isolation is not only about hiding PIDs. It is about controlling visibility, lifecycle, signals, debugging, and failure boundaries.

That is why PID namespaces are not just a container feature.

They are a production engineering concept.

If you understand PID namespaces well, Docker feels less like magic and more like a thin layer over powerful Linux primitives.

Top comments (0)