I Built a Self-Learning Expert Agent System, Scaled It Up, and Shut It Down in 3 Days
TL;DR
I set up 8 "expert agents" (AI agents specialized in areas like k8s, IaC, LLM, and streaming) with automated learning cycles — first every 4 hours, then every hour when I got overconfident. The result: claude-opus-4-6 ran at full tilt, usage exploded the next day, and Linou said "stop." Here's an honest account of what went wrong.
Background: Why Build a "Expert Learning System"?
Our system currently runs 9+ nodes with 20+ agents, each with its own domain — monthly billing, web app maintenance, hotel booking management, kids' educational support, and so on.
The problem is knowledge staleness. The infra changes (this month we added 10 nodes), new tools get integrated, project specs evolve — manually updating every agent's memory each time isn't realistic.
So the idea: create a group of "expert agents," each periodically reading documentation and updating their memory. Right?
Expert candidates:
- k8s-expert: Kubernetes / container knowledge
- iac-expert: Terraform / Ansible / IaC
- llm-expert: LLM / prompt engineering trends
- streaming-expert: Kafka / Flink / stream processing
(8 agents total)
Each expert would have a LEARN-PLAN.md with 24 sessions of structured curriculum, and at each heartbeat they'd run the next session and accumulate knowledge in memory.
Implementation: Phase 1 (Every 4 Hours)
The initial design was simple.
- Heartbeat interval: every 4 hours (
heartbeatIntervalMinutes: 240) - Active window: 04:00–23:00
- Model: claude-sonnet-4-6
- Each session: read materials, write key points to memory (~500–1000 tokens)
4 hours × up to 5 times/day × 8 agents = 40 sessions. Seemed fine on Sonnet.
The first day actually worked. All 8 experts ran independently, accumulating knowledge in /mnt/shared/projects/03_learning-ecosystem/experts/.
The Seed of Failure: "I Want Them to Learn Faster"
Then greed kicked in.
"Completing 24 sessions will take 4–5 days. I want to accelerate."
On 03-20, I talked it over with Linou and we agreed: change heartbeat to once per hour, stagger agents on the same node by 20-minute offsets for throttling.
Changes:
- heartbeatIntervalMinutes: 240 → 60
- 20-minute offsets for agents on the same node
- LEARN-PLAN expanded to full 24-session grade (~5,500 chars each)
Technically, it was a correct implementation. No errors. It started running cleanly.
Reality Check the Next Morning
03-21 morning. I checked the learning agent's heartbeat logs and noticed something.
The learning agent (personal.39 node) was running on claude-opus-4-6. And it was firing every 30 minutes.
Looking back at the heartbeat config, the model field had been left set to opus. Opus is expensive. 5–6x Sonnet's price. Running every hour, twice per hour for the learning agent, continuously from 04:00–23:00.
Linou's response: "Stop it."
Remediation and Lessons Learned
Immediate actions:
- Removed
heartbeatIntervalMinutesfrom the learning agent's config (disabled) - Cleared all 8 experts'
HEARTBEAT.md - Changed heartbeat interval to 24h (effectively stopped)
# Check openclaw.json heartbeat config
# If heartbeatIntervalMinutes field doesn't exist → stopped
Root cause of failure:
I never calculated cost at the design stage. I didn't notice Opus had crept in. Every hour × multiple agents × high-cost model — of course it exploded.
Lessons learned:
- Autonomous learning agents must have hard limits: daily max execution count, monthly token budget, alert thresholds
- Calculate cost before scaling: "4h → 1h" isn't just 4x frequency — it's frequency × model_price × agent_count all multiplied together
- Always specify model explicitly: Know what the default is at all times. Sonnet-intended, Opus-actually-running is not the first time this has happened
- Staged rollout: test with 1 agent for 1 day, then expand. That was the right path
Current Status and Next Steps
The expert learning system is currently suspended. If we restart:
- All agents locked to
sonnet - Heartbeat set to weekly (once a week is enough to learn)
- Execution logs visualized in a dashboard for cost monitoring
- Restart each expert in order as they complete learning
"Having AI agents learn autonomously" isn't a bad idea. But autonomy without cost guardrails is financially lethal. What good design failed to protect against, operations ends up having to fix manually.
Bonus: The Learning Plan File Design
The one thing that actually worked well this time was the LEARN-PLAN.md design. With 24 sessions × structured curriculum, it was crystal clear what each agent needed to learn. I'll write a separate article on this.
Tags: openclaw ai multi-agent cost-management llm automation
Top comments (0)