When Containers Kill Nodes: Understanding Zombie Processes and PID 1

#kubernetes #devops #linux #docker

The Hook

Early in my career, I witnessed something that changed how I think about containers forever. We were running MySQL on Kubernetes with Rocky Linux nodes. Everything seemed fine until nodes started dying one by one. The culprit? Zombie processes. Hundreds of them, silently accumulating until the node couldn't take it anymore.

This incident taught me a fundamental truth: containers are not lightweight VMs. They're just processes.

What Exactly Are Zombie Processes?

When a process finishes execution in Linux, it doesn't just disappear. It enters a "zombie" state the process has completed, but its entry still exists in the process table.

Why? Because the parent process needs to read the child's exit status using the wait() system call. Until the parent calls wait(), the child remains a zombie.

Parent Process
      |
      |--- fork() ---> Child Process
      |                     |
      |                     | (does work)
      |                     |
      |                     v
      |                 Exits (becomes zombie)
      |                     |
      |<--- wait() ---------+
      |
      v
   Zombie cleaned up

In a normal Linux system, this isn't a big problem. If a parent dies without calling wait(), the orphaned children get adopted by the init process (PID 1). The init process periodically reaps these zombies.

Why Containers Break This Model

Here's where containers get tricky.

When you run a container without an init process, your application becomes PID 1. There is no traditional init process. Your app is now responsible for reaping zombie processes.

FROM mysql:8.0
# MySQL process becomes PID 1
# It was never designed to be an init system

Most applications including MySQL are not designed to be init processes. They don't call wait() on orphaned children. So when child processes die, they become zombies with no one to clean them up.

The accumulation begins.

On the node, ps aux | grep Z showed hundreds of zombie MySQL helper processes each one dead but still holding onto its entry in the process table.

Each zombie holds:

An entry in the process table
A PID (and PIDs are finite)

Eventually, you run out of PIDs or the process table fills up. New processes can't spawn. The node becomes unstable. Services crash.

The Fix: Tini

The solution is surprisingly simple: use a proper init process designed for containers.

Tini is a minimal init system built specifically for containers. It:

Runs as PID 1
Spawns your application as a child process
Forwards signals properly
Reaps zombie processes by calling wait()

Implementation

Option 1: Install in Dockerfile

FROM mysql:8.0

# Install tini
RUN apt-get update && apt-get install -y tini

# Set tini as entrypoint
ENTRYPOINT ["/usr/bin/tini", "--"]

# Your actual command
CMD ["mysqld"]

Option 2: Use Docker's built-in init

docker run --init mysql:8.0

Option 3: Kubernetes

apiVersion: v1
kind: Pod
spec:
  containers:
  - name: mysql
    image: mysql:8.0
    # Note: For Kubernetes, you typically bake tini into the image
    # or use a base image that includes it

In Kubernetes, the safest pattern is to bake an init like tini into the image, because relying on runtime flags is not portable across environments.

The Bigger Lesson

This incident challenged my mental model of containers. I used to think of them as "lightweight VMs" isolated boxes running their own little world.

The reality is different. A container is just a process with fancy isolation (namespaces, cgroups). It shares the kernel with the host. When that process misbehaves by spawning zombies, consuming memory, or exhausting PIDs the host suffers.

Understanding this changes how you:

Debug container issues
Design container images
Think about resource limits and isolation

Quick Reference

Scenario	What Happens	Fix
App as PID 1, spawns children	Zombies accumulate	Use tini or --init
App crashes without signal handling	Orphaned children become zombies	Proper init + signal forwarding
Too many zombies	PID exhaustion, node instability	Prevention via init system