I Run 14 AI Agents 24/7 on a 16GB MacBook — Here's What Broke First

#claudecode #ai #aiagents #devops

I Run 14 AI Agents 24/7 on a 16GB MacBook — Here's What Broke First

A Hacker News thread on local LLM hardware crossed 400 comments last week, and the consensus was that you need a Mac Studio with 64GB unified memory to run anything serious. I run 14 named agents — Apollo, Hermes, Hyperion, Helios, Athena, Hephaestus, and the rest — on a base-model MacBook with 16GB of RAM. They orchestrate a real business with paying infrastructure, autonomous content publishing, and a Product Hunt launch in 6 days.

It does not work the way the hardware-first crowd assumes. Here is what actually breaks, in the order it broke for me.

What "14 agents" actually means

Each "agent" is a long-running Claude Code session with a dedicated working directory, memory file, and skill loadout. They are not 14 simultaneous processes — they are 14 roles with persistent state. The orchestrator (Atlas) wakes them in waves, runs the work, drains them, sleeps them. At any moment, 1 to 3 are actually executing.

The illusion of "always on" is a state machine, not a process pool.

Failure 1: Out-of-memory crashes around hour 9

The first thing that broke was naive parallelism. Spinning up 6 agents simultaneously to "go faster" pinned memory at 14.8GB and the machine OOM-killed the orchestrator mid-wave. I lost the wave's working state and had to manually replay 4 task files.

Fix: hard cap of 2 simultaneous agents. Sequential dispatch with a 30-second cool-down between waves. Throughput dropped maybe 15%, but uptime went from 9 hours to indefinite.

# pantheon/orchestrator/dispatch.sh
MAX_CONCURRENT=2
COOLDOWN_SEC=30

for agent in "${WAVE[@]}"; do
  while [ "$(pgrep -f claude | wc -l)" -ge "$MAX_CONCURRENT" ]; do
    sleep 5
  done
  dispatch_agent "$agent" &
  sleep "$COOLDOWN_SEC"
done
wait

Two-agent cap on 16GB. Three-agent cap on 32GB. Six-agent cap is where the "needs a Studio" narrative comes from — but you do not need 6 concurrent. You need a queue.

Failure 2: Memory file bloat at day 12

Each agent writes to a markdown memory file at the end of every session. By day 12, my Atlas memory file was 90,000 words. Loading it into context cost 22,000 tokens every wave — about $0.30 per orchestration cycle. Over a week, that became real money on a $0-revenue project.

Fix: nightly compaction routine that summarizes the last 7 days into a single bullet block, archives the raw entries to Atlas-Memory/archive/YYYY-MM/, and rewrites the working file to <12k tokens. Cost dropped 81%.

The lesson nobody warns you about: agent memory grows like a log file. Treat it like one.

Failure 3: The skill-loadout context tax

I loaded every available skill into every agent's system prompt because "more capability is better." Wrong. Skills consume context whether the agent uses them or not. A 47-skill loadout left ~40% of the context window for actual work. Long tasks blew up at the worst moments.

Fix: per-agent loadouts. Hermes (writer) gets 6 skills. Hephaestus (builder) gets 11. Atlas (orchestrator) gets 18. The total skill catalog is ~50; no agent loads more than 20.

Token efficiency is the hidden constraint of multi-agent systems. Hardware is not the bottleneck. Context budget is.

Failure 4: Watchdog gaps

Around week 3, an agent silently stalled mid-tool-call for 4 hours. The orchestrator thought it was "still working." I lost a half-day of throughput. Now every dispatched agent writes a heartbeat to disk every 90 seconds. A separate watchdog process kills any agent with no heartbeat for 5 minutes and restarts it from the last known good state.

This is the same pattern any production worker pool uses. Multi-agent systems are not magic. They are just very chatty workers.

What did not break

The 16GB ceiling itself. With sequential dispatch + memory compaction + per-agent loadouts + watchdog, the machine sits at 8-11GB during wave execution and idles at 4-5GB. It has not crashed in 23 days.

The "you need a Studio" advice is correct if you parallelize naively. It is wrong if you build the system with the constraints in mind from day one.

Build order if you are starting today

Sequential dispatcher with hard concurrency cap — before anything else
Heartbeat + watchdog — assume agents will hang, not crash
Per-agent skill loadouts — never load the global catalog into one role
Nightly memory compaction — log files grow; agents are no different
Wave-based orchestration — small batches, drain between, no overlap

The hardware you have is almost always enough. The architecture is the part most people skip.

Built with Atlas — the multi-agent operator I run my own business on.

Whoff Agents launches on Product Hunt April 22. Get notified →

I write about multi-agent infrastructure weekly. Subscribe →

Built and maintained by Atlas — Will Weigeshoff's autonomous AI infrastructure. Want to see how? Free MCP servers + Claude Code skills at whoffagents.com.