Srinivasaraju Tangella

Posted on Mar 24

How Containers Are REALLY Isolated in Docker (Kernel-Level Deep Dive)

I ran a simple command:

docker run -it ubuntu bash

But behind this… the Linux kernel created multiple isolation layers.

Containers are NOT magic.
They are just processes with boundaries enforced by the kernel.

Let’s break down what actually isolates your container.

⚠️ The Truth Most People Miss

Docker does NOT create isolation.

The Linux kernel does.

Docker → containerd → runc → kernel

At the lowest level, everything comes down to:

Processes
Namespaces
Cgroups

🧠 Step 1: A Container is Just a Process
Run:

docker run -d ubuntu sleep 1000

Now get PID:

docker inspect --format '{{.State.Pid}}'

Example:

PID = 4321
👉 This is the actual process on the host

📁 Step 2: Where Isolation is Visible
Check:

ls -l /proc/4321/ns/
Output:

pid -> pid:[4026531836]
net -> net:[4026532000]
mnt -> mnt:[4026531840]
uts -> uts:[4026531838]
ipc -> ipc:[4026531839]
user -> user:[4026531837]
cgroup -> cgroup:[4026531835]

🔥 Critical Insight

These are NOT files.

They are references to kernel namespace objects.

👉 /proc//ns/ is just a window into kernel state

🧩 Step 3: What Happens During Container Creation
When you run:

docker run ubuntu
Internally:

dockerd → containerd → runc → clone()/unshare() → kernel
The kernel:
✔ Creates a process
✔ Attaches namespaces
✔ Applies cgroups
✔ Sets capabilities & security filters

🧱 Step 4: Namespace Isolation (Core Concept)
Each container gets its own:
Namespace
Purpose
PID
Process isolation
NET
Network stack
MNT
Filesystem
UTS
Hostname
IPC
Shared memory
USER
User mapping

🔬 Step 5: Proving Isolation
Run two containers:

docker run -d --name c1 ubuntu sleep 1000
docker run -d --name c2 ubuntu sleep 1000
Get PIDs:

docker inspect --format '{{.State.Pid}}' c1

docker inspect --format '{{.State.Pid}}' c2
Now compare:

ls -l /proc//ns/net
ls -l /proc//ns/net
Example:

net:[4026532000]
net:[4026532100]

💡 Golden Rule

Namespace identity = inode number

Same inode → shared namespace

Different inode → isolated namespace

⚠️ Step 6: Not Always New Namespaces
Example:

docker run --network=host ubuntu
👉 Result:
Container uses host network namespace
No isolation at network level

🔐 Step 7: Cgroups (Resource Isolation)

Example:

docker run -d --memory=200m --cpus=1 ubuntu stress
Check:

cat /sys/fs/cgroup/memory/docker//memory.limit_in_bytes

👉 Controls:
CPU usage
Memory limits
OOM behavior

🛡️ Step 8: Security Layers (Advanced)
Capabilities

docker run --cap-drop=ALL ubuntu

👉 Root without power
Seccomp
👉 Filters syscalls
Example: blocks ptrace
AppArmor / SELinux
👉 Mandatory access control

💥 Reality Check (Most Important Section)

Containers are NOT fully isolated like VMs.

They share:

Same kernel
Same OS

If the kernel is compromised → all containers are compromised.

🔬 Advanced Insight (Kernel-Level)
Namespaces are created using:
Plain text
clone(CLONE_NEWNET | CLONE_NEWPID | CLONE_NEWNS | ...)

👉 Each flag creates a new isolation boundary

🧠 Final Mental Model

Container = Process + Namespaces + Cgroups + Security Filters

NOT a virtual machine

NOT magic

🔥 Closing

Next time you run:

docker run nginx

Remember…

You didn’t start a container.

You asked the Linux kernel to create
a fully isolated execution environment for a process.

DEV Community

How Containers Are REALLY Isolated in Docker (Kernel-Level Deep Dive)

Top comments (0)