DEV Community

Cover image for When Containers Kill Nodes: Understanding Zombie Processes and PID 1
Shailendra Verma
Shailendra Verma

Posted on

When Containers Kill Nodes: Understanding Zombie Processes and PID 1

The Hook

Early in my career, I witnessed something that changed how I think about containers forever. We were running MySQL on Kubernetes with Rocky Linux nodes. Everything seemed fine until nodes started dying one by one. The culprit? Zombie processes. Hundreds of them, silently accumulating until the node couldn't take it anymore.

This incident taught me a fundamental truth: containers are not lightweight VMs. They're just processes.


What Exactly Are Zombie Processes?

When a process finishes execution in Linux, it doesn't just disappear. It enters a "zombie" state the process has completed, but its entry still exists in the process table.

Why? Because the parent process needs to read the child's exit status using the wait() system call. Until the parent calls wait(), the child remains a zombie.

Parent Process
      |
      |--- fork() ---> Child Process
      |                     |
      |                     | (does work)
      |                     |
      |                     v
      |                 Exits (becomes zombie)
      |                     |
      |<--- wait() ---------+
      |
      v
   Zombie cleaned up
Enter fullscreen mode Exit fullscreen mode

In a normal Linux system, this isn't a big problem. If a parent dies without calling wait(), the orphaned children get adopted by the init process (PID 1). The init process periodically reaps these zombies.


Why Containers Break This Model

Here's where containers get tricky.

When you run a container without an init process, your application becomes PID 1. There is no traditional init process. Your app is now responsible for reaping zombie processes.

FROM mysql:8.0
# MySQL process becomes PID 1
# It was never designed to be an init system
Enter fullscreen mode Exit fullscreen mode

Most applications including MySQL are not designed to be init processes. They don't call wait() on orphaned children. So when child processes die, they become zombies with no one to clean them up.

The accumulation begins.

On the node, ps aux | grep Z showed hundreds of zombie MySQL helper processes each one dead but still holding onto its entry in the process table.

Each zombie holds:

  • An entry in the process table
  • A PID (and PIDs are finite)

Eventually, you run out of PIDs or the process table fills up. New processes can't spawn. The node becomes unstable. Services crash.


The Fix: Tini

The solution is surprisingly simple: use a proper init process designed for containers.

Tini is a minimal init system built specifically for containers. It:

  1. Runs as PID 1
  2. Spawns your application as a child process
  3. Forwards signals properly
  4. Reaps zombie processes by calling wait()

Implementation

Option 1: Install in Dockerfile

FROM mysql:8.0

# Install tini
RUN apt-get update && apt-get install -y tini

# Set tini as entrypoint
ENTRYPOINT ["/usr/bin/tini", "--"]

# Your actual command
CMD ["mysqld"]
Enter fullscreen mode Exit fullscreen mode

Option 2: Use Docker's built-in init

docker run --init mysql:8.0
Enter fullscreen mode Exit fullscreen mode

Option 3: Kubernetes

apiVersion: v1
kind: Pod
spec:
  containers:
  - name: mysql
    image: mysql:8.0
    # Note: For Kubernetes, you typically bake tini into the image
    # or use a base image that includes it
Enter fullscreen mode Exit fullscreen mode

In Kubernetes, the safest pattern is to bake an init like tini into the image, because relying on runtime flags is not portable across environments.


The Bigger Lesson

This incident challenged my mental model of containers. I used to think of them as "lightweight VMs" isolated boxes running their own little world.

The reality is different. A container is just a process with fancy isolation (namespaces, cgroups). It shares the kernel with the host. When that process misbehaves by spawning zombies, consuming memory, or exhausting PIDs the host suffers.

Understanding this changes how you:

  • Debug container issues
  • Design container images
  • Think about resource limits and isolation

Quick Reference

Scenario What Happens Fix
App as PID 1, spawns children Zombies accumulate Use tini or --init
App crashes without signal handling Orphaned children become zombies Proper init + signal forwarding
Too many zombies PID exhaustion, node instability Prevention via init system

Key Takeaways

  1. Zombies are normal they only become a problem when not reaped
  2. Containers don't have a traditional init by default
  3. Your app shouldn't be PID 1 unless it's designed for it
  4. Tini/dumb-init are simple fixes that should be standard practice
  5. Containers are processes, not VMs never forget this

Have you faced similar container gotchas in production? I'd love to hear your war stories.


Top comments (0)