I Built a Zombie Process Killer Because Claude Code Ate 14GB of My RAM

#ai #opensource #node #productivity

I lost an entire afternoon to a phantom memory leak that wasn't a leak at all. My MacBook was crawling — 14GB of RAM consumed by processes I never launched. The culprit? Dozens of orphaned MCP servers, headless Chrome instances, and sub-agents left behind by AI coding sessions. I built zclean to kill them automatically. Here's the full setup.

TL;DR: AI coding tools like Claude Code and Codex spawn child processes (MCP servers, browser daemons, sub-agents) that don't get cleaned up when sessions end. These orphans accumulate silently and can consume 10GB+ of RAM within a single workday. zclean detects and kills them safely — it hooks into your session lifecycle and runs on a schedule. One npx zclean init sets up everything. I went from 3–4 forced reboots per week to zero manual intervention.

The Problem Nobody Talks About

AI coding tools don't clean up after themselves. After four months of heavy Claude Code use, I started noticing my machine getting sluggish by mid-afternoon — a dozen node processes, several chrome-headless-shell instances, a few mcp-server-* processes running hours after I had ended those sessions.

Every AI coding session spawns a tree of child processes. Claude Code launches MCP servers for file access, web search, and custom tools. It fires up headless browsers for web research. Codex spawns sub-agents. When the session ends, these processes are supposed to terminate. They don't — at least not reliably.

I ran a quick check one evening:

ps aux | grep -E 'mcp-server|chrome-headless|agent-browser' | wc -l

**37 processes.** All orphans. All consuming memory. Combined RAM usage: north of 6GB for processes doing absolutely nothing.

This isn't just me. The same reports appear across X and dev forums — "Claude Code is heavy," "my machine slows down after a few sessions." The tool itself isn't heavy. **The zombies it leaves behind are heavy — and they accumulate with every session you run.**

## How zclean Works

`zclean` detects and terminates orphaned AI tool processes using a conservative four-condition filter. The core principle: **if the parent process is alive, don't touch it.**

A process only gets flagged as a zombie when ALL of these conditions are true:

1. **It's an orphan** — its parent has been reassigned to init/launchd (PPID = 1 on macOS)
2. **It matches a known pattern** — command line matches AI tool process signatures
3. **It's not in an active session** — not part of a tmux/screen process tree
4. **It's in the host namespace** — not inside a Docker container

This means your intentionally running dev server is always safe. Your `pm2`-managed processes are safe. Your `nohup` background jobs are safe. Only genuinely abandoned processes get killed.

### The Target List

Here's what `zclean` looks for by default:

| Category | Process Pattern | Source |
|----------|----------------|--------|
| MCP servers | `mcp-server-*` | Claude Code |
| Browser daemons | `agent-browser`, `chrome-headless-shell`, `playwright/driver` | Claude Code, Codex |
| Sub-agents | orphaned `claude --print`, `codex exec` | Claude Code, Codex |
| Build zombies | `esbuild`, `vite`, `next dev`, `webpack` (24h+ orphan) | Common |
| npm zombies | `npm exec`, `npx` (no parent) | Common |
| Node orphans | `node` (no parent + 24h+ or 500MB+ + AI tool path in cmdline) | Common |
| Runtime orphans | `tsx`, `ts-node`, `bun`, `deno`, `python` (MCP server pattern) | Common |

**Build tools like `vite` and `webpack` get a 24-hour grace period.** A long-running build that legitimately takes hours shouldn't be killed. But if an orphaned `esbuild` process has been sitting there for over a day with no parent, it's dead weight.

## Setting It Up: One Command

bash
npx zclean init

That's it. Here's what happens under the hood:

Step 1: OS Detection. zclean figures out if you're on macOS, Linux, or Windows and configures the right process scanning and scheduling mechanisms.

Step 2: Claude Code Hook. It registers a SessionEnd hook in your Claude Code settings.json:

{
  "hooks": {
    "SessionEnd": [
      {
        "type": "command",
        "command": "npx zclean --session-pid $SESSION_PID --yes"
      }
    ]
  }
}

This is the first line of defense — **every time a Claude Code session ends, `zclean` immediately cleans up that session's orphaned children within milliseconds.** The `--session-pid` flag scopes cleanup to that specific process tree, avoiding any risk to unrelated processes.

**Step 3: OS Scheduler.** For zombies that slip through (crashes, force-quits, other AI tools without hooks), `zclean` sets up a recurring hourly cleanup:

On macOS, it creates a LaunchAgent:

bash
~/Library/LaunchAgents/com.zclean.hourly.plist

On Linux, a systemd user timer:

~/.config/systemd/user/zclean.timer

On Windows, a user-scoped Task Scheduler entry.

**Step 4: Config file.** Drops a config at `~/.zclean/config.json` where you can whitelist processes, adjust thresholds, and customize behavior.

**Step 5: First scan.** Runs an immediate dry-run so you can see what it would have killed before enabling automatic cleanup.

## The Safety Mechanisms

I spent more time on the "don't kill the wrong thing" logic than on the actual killing. **Getting a false positive means terminating someone's running dev server — that's a non-starter.** Three independent safety layers prevent this.

### PID Reuse Protection

Between the time `zclean` scans and the time it kills, a process could die and its PID could be reassigned to something completely different. On a busy system, PID reuse happens faster than you'd expect — Linux recycles PIDs in order, so a freshly spawned process can inherit a just-killed PID within seconds.

Before every kill, `zclean` re-verifies three things:

1. **The PID still exists**
2. **The process start time matches** what was recorded during the scan (to the second)
3. **The command line matches** what was recorded during the scan

**If any of these three checks fail, the kill is skipped entirely.** This eliminates the complete class of PID reuse bugs without requiring atomic operations or locks.

### The Whitelist

Some processes look like zombies but aren't. The config handles persistent legitimate orphans:

json
{
"whitelist": [
"mcp-server-custom-db",
"my-persistent-agent"
],
"maxAge": 86400,
"maxMemoryMB": 500,
"dryRun": false
}

whitelist: Process names that are never touched, regardless of orphan status
maxAge: Seconds before an orphan gets flagged (default: 86400 = 24 hours for build tools)
maxMemoryMB: Memory threshold that escalates urgency (default: 500MB — above this, the process is flagged sooner)
dryRun: Global toggle — set to true to audit without committing

Protected Process Trees

Beyond the whitelist, zclean walks the full process tree to protect anything descended from:

tmux / screen sessions — if the process is a descendant of a terminal multiplexer, it's intentional
Daemon managers — pm2, forever, supervisord, systemd services
VS Code — gets a 48-hour grace period since VS Code's process tree can appear orphaned after restarts
Docker containers — checked via PID namespace on Linux (/proc/<pid>/ns/pid); Docker Desktop on macOS runs in a VM so container processes aren't visible to the host ps at all

Daily Usage

Most of the time, you forget zclean exists. That's the goal. When you want visibility:


bash
# See what would be killed (dry-run, default)
npx zclean

# Actually kill the zombies
npx zclean --yes

# Check current zombie status
npx zclean status

# View kill history with timestamps and RAM reclaimed
npx zclean logs

# Show current config
npx zclean config

A typical dry-run output looks like this:

  zclean — scanning for zombie processes...

  Found 4 zombie processes:

  PID    CMD                          RAM      AGE
  ────   ───────────────────────────  ───────  ──────
  8234   mcp-server-filesystem        42 MB    3h 12m
  8891   chrome-headless-shell        287 MB   2h 45m
  9102   mcp-server-fetch             18 MB    1h 58m
  12044  node (claude subagent)       156 MB   4h 03m

  Total reclaimable: 503 MB

  Run with --yes to kill these processes.

**503MB from four processes on a light day.** Peak scans have returned 15+ zombies consuming over 3GB on days with multiple long AI coding sessions.

## The Dual Protection Architecture

`zclean` doesn't rely on a single cleanup mechanism — redundancy is intentional.

**Layer 1: Session Hook** — fires on every clean session exit via the Claude Code `SessionEnd` hook. This catches the common case immediately, with zero delay between session end and cleanup.

**Layer 2: OS Scheduler** — runs hourly (configurable down to every 15 minutes). This catches everything the hook misses: crashed sessions, force-quits, Codex sessions that lack hook support, and any AI tool that spawns processes without a cleanup contract.

**The hook handles approximately 80% of cases instantly. The scheduler handles the remaining 20% within one hour.** Together, zombie RAM accumulation drops effectively to zero over any meaningful time period — verified over six weeks of continuous use on a MacBook Pro M2.

## What I'd Do Differently

**I should have built the process tree walker first.** I started with simple PPID checks and pattern matching, then kept bolting on edge cases — the tmux protection, the VS Code grace period, the Docker namespace check. The tree walker should have been the foundation. It would have reduced total code by roughly 30% and made the protection logic composable instead of a chain of special cases.

**The Windows implementation needs more real-world testing.** macOS and Linux use `/proc` and `ps`, which are well-understood and stable. Windows requires WMI queries through PowerShell, and the process model is fundamentally different — no PPID concept in the same sense, different namespace isolation. It works in my testing environment, but I have less confidence in Windows edge cases than on Unix systems.

**I underestimated how many processes VS Code itself orphans.** The 48-hour grace period for VS Code descendants was added reactively after I accidentally killed a legitimate TypeScript language server. The line between "VS Code orphan" and "AI tool orphan spawned through VS Code's integrated terminal" is genuinely blurry — VS Code's process tree is already unusual before you add AI tools to the mix.

## The Numbers

### Before vs. After zclean

| Metric | Before | After |
|--------|--------|-------|
| Average orphan processes (end of day) | 12–20 | 0–2 |
| RAM consumed by orphans | 2–8 GB | < 100 MB |
| Manual force-reboots per week | 3–4 | 0 |
| Time spent investigating "why is my Mac slow" | ~30 min/day | 0 |

### Tool Resource Footprint

| Metric | Value |
|--------|-------|
| zclean scan time | < 200ms |
| RAM usage during scan | ~12 MB |
| npm dependencies | 0 (pure Node.js) |
| Supported platforms | macOS, Linux, Windows |
| Config file size | ~200 bytes |
| Install + init time | < 10 seconds |

**Zero npm dependencies.** The scanner uses `child_process.execSync` with native OS commands (`ps` on Unix, `Get-Process` on Windows). No native modules, no compilation step, no `node-gyp` nightmares. The entire tool is a single Node.js file you can read and audit in under 10 minutes.

## FAQ

**Does zclean kill my running dev server?**

No. `zclean` only targets orphan processes — those whose parent has died and been reassigned to init/launchd (PPID = 1). If your dev server was started from a terminal that's still open, its parent is alive and it won't be touched. Processes managed by pm2, forever, or supervisord are also explicitly protected via tree-walk detection.

**What if I have a legitimate long-running MCP server?**

Add it to the whitelist in `~/.zclean/config.json`. Whitelisted process names are never killed regardless of orphan status or memory usage. You can also adjust `maxAge` if the default 24-hour grace period isn't sufficient for your workflow.

**Does it work with Codex, Cursor, or other AI coding tools?**

Yes. The `SessionEnd` hook is specific to Claude Code, but the OS scheduler (Layer 2) catches orphans from any tool. The target process patterns include common signatures from Codex, Cursor, and other tools that spawn MCP servers and headless browsers. If your tool's processes become orphaned and match the known patterns, `zclean` will find them on the next hourly run.

**Can it accidentally kill a process inside Docker?**

No. On Linux, `zclean` checks the PID namespace via `/proc/<pid>/ns/pid` to confirm the process is in the host namespace. Docker containers run in isolated PID namespaces and are excluded from scanning entirely. On macOS, Docker Desktop runs in a Linux VM, making container processes invisible to the host `ps`.

**What happens if zclean itself crashes mid-kill?**

Each kill is independent — there's no shared transaction state. If `zclean` crashes after killing 3 of 7 zombies, the remaining 4 will be caught on the next scheduled run within an hour. **The PID reuse protection ensures that even if system state changes between a scan and a kill, no incorrect process is terminated.**

## Try It Yourself

1. **Install and initialize** — `npx zclean init` detects your OS, registers hooks, and sets up the scheduler in under 10 seconds
2. **Run a dry scan** — `npx zclean` shows what would be killed without touching anything
3. **Check the output** — verify the detected processes are genuinely orphaned
4. **Enable kills** — `npx zclean --yes` or remove `dryRun: true` from config for automated cleanup
5. **Forget about it** — the hook and scheduler handle everything from here

## Wrapping Up

If you're using AI coding tools daily and your machine gets progressively slower through the day, you probably don't have a memory leak — you have a zombie problem. `zclean` is a zero-dependency Node.js utility that fixes it permanently with a single install command and two protection layers.

**What's the worst zombie accumulation you've seen on your machine?** I'm curious whether this skews toward macOS or if Linux and Windows users are hitting it equally.

If this saved you from your next force-reboot, consider sharing it with whoever on your team complains that "Claude is heavy."

Follow me for more posts about building developer tools and the unglamorous infrastructure work behind AI-assisted development.

---

*I build AI-powered developer tools and write about the engineering behind them. Currently running an AI agent orchestration system with multi-model routing across Claude, Gemini, and GPT — which is, ironically, also the source of most of my zombie processes.*

Top comments (3)

Anthony Coffey • Jul 6

nice work, thanks for sharing! I've just been manually killing tasks in task manager every few hours of using Claude Code because they hog so much RAM my machine nearly comes to a halt with random bursts of lag that makes typing 1 sentence totally miserable... (found this article searching the issue for a 2nd or 3rd time, hopefully your solution works for me. That would be big.

thestack_ai • Jul 10

I hope this was helpful. I have additional updates today

Some comments may only be visible to logged-in visitors. Sign in to view all comments.