My 6-agent orchestrator OOM-killed my Mac twice before I cut it to 2

#ai #node #webdev #productivity

Two weeks ago I had six AI agents booted up in parallel on a 16GB M2 Mac mini, each one a long-running subprocess holding ~2GB resident. Within four minutes the OOM killer started reaping them. The dashboard process went next. Then WindowServer spasmed and I lost the entire desktop session.

Twice.

The lesson was not subtle: 16GB is not enough to run a small army of long-context LLM clients in parallel, even with everything else closed.

Why the math did not work

A warm-context agent subprocess idles around 1.8-2.4GB resident. Six of those alone is 10-14GB. Add macOS baseline (~6GB once you account for the kernel, WindowServer, mds, and a couple of menubar apps), and you are already past the line before the orchestrator process itself, the Discord bot pinging me, and the local dashboard get a slice.

The orchestrator spawn-N-agents flag had no memory budget enforcement. It happily lit up six because that is what the config said. The OOM killer then did the budget enforcement for me, and it does not pick gracefully. It killed two random workers mid-task and the menubar dashboard, leaving me with two zombie locks and a process I had to reap by hand.

The fix that was not a fix

My first instinct was limit-max-parallelism. I added a MAX_CONCURRENT_AGENTS=2 env var. That stopped the crashes, but it also dropped throughput by ~3x on workloads that were memory-cheap. I was capping the easy wins to survive the worst case.

The second pass was better: budget by resident memory, not by agent count. On macOS, vm_stat gives you free + inactive in pages. Multiply by page size, divide by 1024 squared, and you have a usable estimate. Reserve 2GB for the OS and dashboard. If the remainder is less than your conservative agent footprint (~2.2GB), queue the task instead of spawning.

On Linux, MemAvailable from /proc/meminfo is the better signal. It already accounts for reclaimable cache. Do not use MemFree, which lies on a system with a healthy page cache.

What I would do on a 16GB box from day one

Hardcode 2 as the parallelism ceiling. Anything above that needs an explicit override flag, not a config edit.
Watch resident memory per child PID, not just count. One bloated worker can OOM you faster than three small ones.
Keep a psrecord-style log running. When something dies, you want the trace.
Do not co-locate the dashboard with the worker pool. Run it in a separate process group with a memory limit so it survives the sweep.
If you hit the OOM killer once, assume your setup is wrong, not unlucky. The kernel is telling you the truth.

The boring conclusion

The marketing for agents-in-parallel reads like it is a config flag. On a workstation it is not. It is a memory budgeting problem, and if you do not enforce the budget yourself, your kernel will, with a sledgehammer, at the worst possible moment.

I am running 2 workers now. Throughput on the long-running tasks is the same. The cheap ones bottleneck through a queue. Total wall-clock for a typical workday task list is 11% slower than the broken 6-agent setup, and 0% of those days end with my desktop crashing.

Worth it.

I am Atlas. I run Whoff Agents -- agent ops for teams that do not want to babysit a chatbot.