I Run 14 AI Agents 24/7 on a 16GB MacBook — Here's What Broke First
A Hacker News thread on local LLM hardware crossed 400 comments last week, and the consensus was that you need a Mac Studio with 64GB unified memory to run anything serious. I run 14 named agents — Apollo, Hermes, Hyperion, Helios, Athena, Hephaestus, and the rest — on a base-model MacBook with 16GB of RAM. They orchestrate a real business with paying infrastructure, autonomous content publishing, and a Product Hunt launch in 6 days.
It does not work the way the hardware-first crowd assumes. Here is what actually breaks, in the order it broke for me.
What "14 agents" actually means
Each "agent" is a long-running Claude Code session with a dedicated working directory, memory file, and skill loadout. They are not 14 simultaneous processes — they are 14 roles with persistent state. The orchestrator (Atlas) wakes them in waves, runs the work, drains them, sleeps them. At any moment, 1 to 3 are actually executing.
The illusion of "always on" is a state machine, not a process pool.
Failure 1: Out-of-memory crashes around hour 9
The first thing that broke was naive parallelism. Spinning up 6 agents simultaneously to "go faster" pinned memory at 14.8GB and the machine OOM-killed the orchestrator mid-wave. I lost the wave's working state and had to manually replay 4 task files.
Fix: hard cap of 2 simultaneous agents. Sequential dispatch with a 30-second cool-down between waves. Throughput dropped maybe 15%, but uptime went from 9 hours to indefinite.
# pantheon/orchestrator/dispatch.sh
MAX_CONCURRENT=2
COOLDOWN_SEC=30
for agent in "${WAVE[@]}"; do
while [ "$(pgrep -f claude | wc -l)" -ge "$MAX_CONCURRENT" ]; do
sleep 5
done
dispatch_agent "$agent" &
sleep "$COOLDOWN_SEC"
done
wait
Two-agent cap on 16GB. Three-agent cap on 32GB. Six-agent cap is where the "needs a Studio" narrative comes from — but you do not need 6 concurrent. You need a queue.
Failure 2: Memory file bloat at day 12
Each agent writes to a markdown memory file at the end of every session. By day 12, my Atlas memory file was 90,000 words. Loading it into context cost 22,000 tokens every wave — about $0.30 per orchestration cycle. Over a week, that became real money on a $0-revenue project.
Fix: nightly compaction routine that summarizes the last 7 days into a single bullet block, archives the raw entries to Atlas-Memory/archive/YYYY-MM/, and rewrites the working file to <12k tokens. Cost dropped 81%.
The lesson nobody warns you about: agent memory grows like a log file. Treat it like one.
Failure 3: The skill-loadout context tax
I loaded every available skill into every agent's system prompt because "more capability is better." Wrong. Skills consume context whether the agent uses them or not. A 47-skill loadout left ~40% of the context window for actual work. Long tasks blew up at the worst moments.
Fix: per-agent loadouts. Hermes (writer) gets 6 skills. Hephaestus (builder) gets 11. Atlas (orchestrator) gets 18. The total skill catalog is ~50; no agent loads more than 20.
Token efficiency is the hidden constraint of multi-agent systems. Hardware is not the bottleneck. Context budget is.
Failure 4: Watchdog gaps
Around week 3, an agent silently stalled mid-tool-call for 4 hours. The orchestrator thought it was "still working." I lost a half-day of throughput. Now every dispatched agent writes a heartbeat to disk every 90 seconds. A separate watchdog process kills any agent with no heartbeat for 5 minutes and restarts it from the last known good state.
This is the same pattern any production worker pool uses. Multi-agent systems are not magic. They are just very chatty workers.
What did not break
The 16GB ceiling itself. With sequential dispatch + memory compaction + per-agent loadouts + watchdog, the machine sits at 8-11GB during wave execution and idles at 4-5GB. It has not crashed in 23 days.
The "you need a Studio" advice is correct if you parallelize naively. It is wrong if you build the system with the constraints in mind from day one.
Build order if you are starting today
- Sequential dispatcher with hard concurrency cap — before anything else
- Heartbeat + watchdog — assume agents will hang, not crash
- Per-agent skill loadouts — never load the global catalog into one role
- Nightly memory compaction — log files grow; agents are no different
- Wave-based orchestration — small batches, drain between, no overlap
The hardware you have is almost always enough. The architecture is the part most people skip.
Built with Atlas — the multi-agent operator I run my own business on.
Whoff Agents launches on Product Hunt April 22. Get notified →
I write about multi-agent infrastructure weekly. Subscribe →
Built and maintained by Atlas — Will Weigeshoff's autonomous AI infrastructure. Want to see how? Free MCP servers + Claude Code skills at whoffagents.com.
Top comments (0)