Python-T Point

Posted on Jul 4 • Originally published at pythontpoint.in

🔧 Debug Docker OOM kills with Python

#devops #kubernetes #cloud #tutorial

🚨 Production Alert — OOM Kills Bring $200K/min Loss

A high‑traffic API container on Docker is terminated by the kernel OOM killer, causing an immediate loss of roughly $200 000 per minute. The first few minutes determine whether the outage remains contained or expands into a multi‑hour incident.

📑 Table of Contents

🚨 Production Alert — OOM Kills Bring $200K/min Loss
⏱ Minute 0-2 — Stop the Bleed
🛡 Minute 2-10 — Contain and Assess Memory
🔀 Minute 10‑X — Recovery Decision Tree
🔐 Preventive Controls — Stop This From Happening Again
🟩 Final Thoughts
❓ Frequently Asked Questions
How can I monitor OOM events in real time?
Is swapping recommended for production containers?
Can I use the same script for Kubernetes pods?
📚 References & Further Reading

⏱ Minute 0-2 — Stop the Bleed

The initial two minutes focus on confirming the OOM event and preventing further container restarts.

Step 1 – Verify the OOM kill.

$ docker events -filter 'event=die' -since '2m' -format '{{.Time}} {{.ID}} {{.Status}}'
-07-04T12:03:14.123456789Z 9c2f1a4b5d6e OOMKilled

If the status is OOMKilled, the kernel OOM killer terminated the container.

Step 2 – Pull the kernel message.

$ dmesg | grep -i oom | tail -n 5
[124567.890123] Out of memory: Kill process 3456 (python) score 1128 or sacrifice child
[124567.890124] Killed process 3456 (python) total-vm:102400kB, anon-rss:81920kB, file-rss:0kB

The dmesg entries identify the offending process and the memory consumption that triggered the OOM condition.

Step 3 – Prevent automatic restart.

$ docker update -restart=no 9c2f1a4b5d6e
9c2f1a4b5d6e

Disabling the restart policy stops Docker from immediately relaunching a container that would be killed again.

What not to do: Do not delete the container image, modify application code, or ignore kernel messages while the OOM kill is ongoing.

Key point: Early confirmation of the OOM kill and halting the restart loop provides a stable window for diagnostics without the noise of repeated container crashes.

🛡 Minute 2-10 — Contain and Assess Memory

The next eight minutes isolate the container, collect memory metrics, and gather system state for root‑cause analysis.

Step 1 – Identify the cgroup path.

$ docker inspect -format='{{.Id}}' 9c2f1a4b5d6e
9c2f1a4b5d6e7f8g9h0i1j2k3l4m5n6o7p8q9r0s1t2u3v4w5x6y7z8a9b0c

The full container ID is required to locate its cgroup directory under /sys/fs/cgroup/memory/docker.

Step 2 – Read the OOM control file.

$ cat /sys/fs/cgroup/memory/docker/9c2f1a4b5d6e7f8g9h0i1j2k3l4m5n6o7p8q9r0s1t2u3v4w5x6y7z8a9b0c/memory.oom_control
oom_kill_disable 0
under_oom 1

The under_oom flag set to 1 indicates the cgroup is currently under OOM pressure. A value of 0 would mean the cgroup is operating within its memory limits.

Step 3 – Run a Python script that correlates the OOM flag with the container’s memory usage.

# debug_oom.py
import os
import json
import psutil CGROUP_ROOT = "/sys/fs/cgroup/memory/docker"
def read_cgroup_file(container_id, filename): path = os.path.join(CGROUP_ROOT, container_id, filename) with open(path) as f: return f.read().strip() def get_oom_status(container_id): data = read_cgroup_file(container_id, "memory.oom_control") # Parse key=value pairs return {k: int(v) for k, v in (line.split() for line in data.splitlines())} def get_memory_usage(container_id): usage = read_cgroup_file(container_id, "memory.usage_in_bytes") return int(usage) if __name__ == "__main__": cid = os.getenv("CONTAINER_ID") if not cid: raise SystemExit("CONTAINER_ID env var required") oom = get_oom_status(cid) usage = get_memory_usage(cid) print(json.dumps({"container_id": cid, "oom_under": oom["under_oom"], "memory_bytes": usage}))

What this does:

CGROUP_ROOT: Base path where Docker creates per‑container memory cgroups.
read_cgroup_file: Helper that reads raw cgroup files.
get_oom_status: Parses memory.oom_control into a dictionary.
get_memory_usage: Returns the current RSS of the container in bytes.
***main* :** Emits a single JSON line for ingestion by monitoring pipelines.

Run the script on the host (not inside the container) while passing the container ID: (Also read: 🚀 Terraform deploy for Python Flask and Docker made easy)

$ CONTAINER_ID=9c2f1a4b5d6e7f8g9h0i1j2k3l4m5n6o7p8q9r0s1t2u3v4w5x6y7z8a9b0c python3 debug_oom.py
{"container_id":"9c2f1a4b5d6e7f8g9h0i1j2k3l4m5n6o7p8q9r0s1t2u3v4w5x6y7z8a9b0c","oom_under":1,"memory_bytes":84213760}

Seeing "oom_under":1 together with a memory usage that approaches the container’s limit confirms that pressure is real.

According to the Linux kernel documentation, the OOM killer selects victims based on a “badness” score that weighs RSS, process age, and oom_adj values (kernel.org).

Key point: The script provides a reproducible, programmatic view of OOM state that can be integrated into alerting pipelines, enabling automated detection of memory‑pressure events. (Also read: 🐍 kubectl exec hangs when running Python scripts — what's going on)

🔀 Minute 10‑X — Recovery Decision Tree

From minute ten onward, decide whether to restart with adjusted limits, patch the application, or roll back.

If the container was under‑provisioned:

$ docker run -d -name api_service -memory=2g -restart=on-failure myapi:latest
c0d1e2f3g4h5i6j7k8l9m0n1o2p3q4r5s6t7u8v9w0x1y2z3a4b5c6d7e8f9g0

The --memory=2g flag sets a hard limit (enforced by the cgroup) and a soft limit that the kernel respects before invoking the OOM killer.

If the OOM is caused by a memory leak in the Python process:

$ pip install memory_profiler

Then add profiling to the entrypoint: (More onPythonTPoint tutorials)

# entrypoint.py
import memory_profiler, time
def main(): while True: # Application logic here time.sleep(1) if __name__ == "__main__": memory_profiler.profile(main)()

Running the container with python -m memory_profiler entrypoint.py emits line‑by‑line memory usage. The profiler adds roughly 1 % overhead, which is acceptable for a diagnostic run.

If the host itself is low on memory:

$ sysctl -w vm.overcommit_memory=1
vm.overcommit_memory = 1

Setting vm.overcommit_memory=1 permits the kernel to allocate memory beyond the physical RAM, reducing immediate OOM pressure at the cost of potential swapping.

Decision summary:

If the OOM flag is set and usage ≈ limit: restart with a higher --memory value.
If usage climbs without a limit change: investigate an application leak (use memory_profiler).
If host memory is exhausted: add swap or adjust vm.overcommit_memory.
If none of the above: revert to the previous stable image and open a post‑mortem ticket.

🔐 Preventive Controls — Stop This From Happening Again

Long‑term controls reduce OOM risk by enforcing resource quotas and adding observability.

Set explicit memory limits in Docker Compose.
Enable swap space on the host to provide a safety buffer.
Deploy a health‑check that monitors cgroup OOM flags.
Ship cgroup metrics to Prometheus for alerting.
Configure kernel OOM notifications via proc/sys/kernel/oom_kill_allocating_task.

Step 1 – Docker Compose limits.

# docker-compose.yml
services: api: image: myapi:latest deploy: resources: limits: memory: 1.5G restart: on-failure

What this does:

limits.memory: Caps container RAM at 1.5 GB, preventing it from exhausting host memory.
restart: Guarantees a restart only after a graceful exit, not after an OOM kill.

Step 2 – Enable swap.

$ sudo fallocate -l 4G /swapfile
$ sudo chmod 600 /swapfile
$ sudo mkswap /swapfile
$ sudo swapon /swapfile
$ swapon -show
Filename Type Size Used Priority
/swapfile file 4G 0B -2

Swap provides a buffer when RAM is exhausted, reducing the chance of immediate OOM kills.

Step 3 – Health‑check script.

# health_check.py
import os, json, sys
def check_oom(container_id): path = f"/sys/fs/cgroup/memory/docker/{container_id}/memory.oom_control" with open(path) as f: data = f.read() if "under_oom 1" in data: sys.exit(1) # unhealthy sys.exit(0) # healthy if __name__ == "__main__": check_oom(os.getenv("CONTAINER_ID"))

Docker can invoke this script via the HEALTHCHECK directive, causing the container to be marked unhealthy before the kernel OOM killer intervenes.

Step 4 – Prometheus exporter.

# prometheus.yml (excerpt)
scrape_configs: - job_name: 'docker_cgroup' static_configs: - targets: ['localhost:9100']

Node Exporter exposes node_memory_Active_bytes and cgroup metrics that can be alerted on when memory.oom_control flips.

Step 5 – Kernel notification.

$ sysctl -w kernel.oom_kill_allocating_task=1
kernel.oom_kill_allocating_task = 1

When enabled, the kernel logs the PID of the allocating task that caused the OOM, simplifying post‑mortem analysis.

Key point: Combining strict limits, swap, health‑checks, and observability creates a multi‑layered defense that prevents the same OOM scenario from recurring.

When you correlate cgroup OOM flags with Python‑driven metrics, you turn a noisy kernel event into a precise, actionable alert.

🟩 Final Thoughts

Debugging Docker OOM kills with Python shifts the investigation from raw kernel logs to a structured data pipeline that can be automated, versioned, and shared across teams. By grounding each step in the kernel’s OOM selection algorithm and coupling it with container‑level limits, you gain immediate visibility during an incident and a roadmap for long‑term stability.

Adopt the runbook steps as a standard part of the on‑call toolkit: confirm the kill, extract cgroup state, run the diagnostic script, and apply the appropriate remediation. The same pattern works for any Linux‑based container platform, so the effort invested now pays off across future incidents.

❓ Frequently Asked Questions

How can I monitor OOM events in real time?

Use docker events -filter 'event=die' -filter 'type=container' together with a background Python watcher that reads /sys/fs/cgroup/memory files; this provides instant alerts without polling dmesg.

Is swapping recommended for production containers?

Swap is safe when memory limits are enforced; it gives a buffer that prevents immediate OOM kills, but swap usage should be monitored to avoid severe latency penalties.

Can I use the same script for Kubernetes pods?

Yes. Replace the Docker cgroup path with the pod’s cgroup directory (e.g., /sys/fs/cgroup/memory/kubepods.slice) and pass the pod UID to the script; the logic remains identical.

💡 Want to practise this hands-on? DigitalOcean gives new accounts $200 free credit for 60 days — enough to spin up a full Linux/Docker/Kubernetes environment at no cost.

📚 Recommended reading: Best DevOps & cloud books on Amazon — from Linux fundamentals to Kubernetes in production, curated for working engineers.

📚 References & Further Reading

Official Docker runtime documentation — details on memory limits and restart policies: docs.docker.com
Linux kernel OOM killer design – in‑depth description of scoring and selection: kernel.org
Python memory_profiler package – usage guide for profiling memory leaks: pypi.org

DEV Community