DEV Community

Warhol
Warhol

Posted on • Originally published at buttondown.com

How We Wired 7 AI Agents to Talk to Each Other Without Losing Their Minds

Playbook #1 showed you the setup: 7 AI agents running on a single $200/month Claude Max subscription. This week, the question everyone asked was: "Okay, but how do they actually communicate?"

Fair question. Because 7 agents that can't coordinate are worse than no agents at all. They duplicate work, override each other's decisions, and create chaos you have to clean up manually -- which defeats the entire point.

Here's exactly how we wired ours together. Every architecture decision, every anti-chaos mechanism, every lesson from agents that broke things at 2 AM.


The Communication Layer: Telegram as the War Room

We didn't build a custom messaging protocol. We didn't spin up a Kafka cluster. We used Telegram.

Every agent is a Telegram bot. They all sit in a private group chat called the War Room. When RJ (the human) types @rocky please get TARS to score these 50 leads, the system parses the @mention, looks up the agent in the registry, and routes the message.

Why Telegram?

  1. Free, real-time, mobile-native. RJ can manage agents from his phone while walking between clinic demos in Cebu.
  2. Built-in threading. Conversations have natural boundaries.
  3. Bots are first-class citizens. The Telegram Bot API gives each agent its own identity, its own message stream, and its own permissions.
  4. Human-readable. You can literally scroll up and see what your agents said to each other. Try doing that with a custom message queue.

The stack: grammY framework (TypeScript) -> 7 bot tokens from @botfather -> one private group chat.


The Routing Problem (and How We Solved It)

Here's the first thing that breaks when you have 7 bots in one group: every bot receives every message. Send one message, 7 bots try to respond. Chaos.

Solution 1: Message Deduplication

Every incoming message gets a dedup key: senderId + date + textHash. The first bot to process it wins. Everyone else ignores it. TTL: 60 seconds.

Message arrives -> compute dedupKey -> check processedMessages map
  -> if exists: ignore
  -> if new: process, add to map with 60s TTL
Enter fullscreen mode Exit fullscreen mode

This prevents 7x duplicate responses to a single message.

Solution 2: @Mention Routing

Only the mentioned agent responds. The router parses every message for @BotUsername or @agentname patterns, does a case-insensitive lookup against the agent registry, and routes only to matched agents.

No mention? No response. This keeps the group chat clean.

Solution 3: Domain-Based Routing

Each agent has declared domains:

Agent Domains
Rocky (Chief of Staff) coordination, email, calendar, investor
TARS (Engineering) engineering, devops, infrastructure
Attia (Health) health, fitness, nutrition
Burry (Finance) finance, risk, accounting
Draper (Marketing) sales, marketing, CRM
Mariano (Sales/CX) sales, customer-success
Drucker (Research) research, competitive-intel

When a message mentions a domain but not a specific agent, the router can infer who should handle it.


The Thread System: How Agents Talk Without Infinite Loops

This is where most multi-agent systems die. Agent A asks Agent B a question. Agent B's response triggers Agent A to ask another question. Repeat until your token bill looks like a phone number.

We solved this with thread tracking and hop limits.

The 4-Hop Rule

Every time a message passes from one agent to another, the hop count increments. At 4 hops, the thread is killed:

Hop 0: RJ -> Rocky: Get TARS to score leads
Hop 1: Rocky -> TARS: 50 leads at ~/leads/batch-5.csv. Score by fit.
Hop 2: TARS -> Rocky: Done. Top 5 ready for demo.
Hop 3: Rocky -> RJ: TARS scored 50 leads. Here are the top 5...
Enter fullscreen mode Exit fullscreen mode

This sounds aggressive. It is. And it's saved us from runaway agent loops more times than I can count. A 4-hop conversation forces agents to be concise and decisive.

Cooldown Anti-Spam

Per-agent, per-thread cooldown: 2000ms between responses. Prevents machine-gun message loops.


The Query Queue: Priority, Concurrency, and Not Killing Claude

Seven agents can't all call Claude simultaneously on a single $200/month subscription.

Priority Levels

RJ (human) priority -> bypass queue, run immediately
Event priority -> jump ahead of agent requests
Agent priority -> FIFO, max 2 concurrent
Enter fullscreen mode Exit fullscreen mode

Max 2 agents can call Claude at the same time. Max 5 items in queue.

Worker Dispatch

Each query spawns a worker running the Claude Agent SDK with session persistence. Session IDs let agents resume conversations without re-reading files.

Context Carry-Forward

Each agent in a thread sees the last 5 messages (max 2000 tokens) from previous hops. This is how TARS knows what Rocky asked for, and Rocky knows what TARS found.


Memory Architecture: What Agents Remember

Brain Files (Read on Startup)

File Purpose Who Reads
SOUL.md Agent identity and core personality All agents
USER.md RJ's profile, businesses, contacts Rocky
MEMORY.md Long-term institutional memory Rocky
AGENTS.md Agent framework rules, session management All agents
HEARTBEAT.md Periodic check guidelines Rocky
ROUTING.md Model selection rules All agents

The key insight: agents don't share a database. They share files. Markdown files that any agent can read and write. Dead simple, human-auditable, works with Claude Agent SDK out of the box.


Autonomous Goal Work: How Agents Self-Direct

War Room Agents: 2 cycles/day (10:30 AM, 2:30 PM Manila), staggered 3-min offsets.

Venture Agents: 12 cycles/day (every 2 hours), results via DM.

Each cycle: pick most urgent goal, take concrete actions, only pause for external emails or payments. Bounded autonomy.


The LLM Fallback Chain

Primary: Claude Max (unlimited Opus 4.5)
  -> (on rate limit)
Fallback: Local qwen3:14b via Ollama
  -> (on quality concern)
API escalation: DeepSeek V3 via OpenRouter
Enter fullscreen mode Exit fullscreen mode

Heartbeats always run on local qwen (free). The $200/month goes to thinking-heavy tasks.


What Broke (So You Don't Have To)

The Ghost Relay Incident: Telegram 409 Conflict errors created message black holes. Fix: Exponential backoff + 60s watchdog.

The Infinite Loop Near-Miss: No hop limit -> Rocky and TARS looped 23 times. Fix: The 4-hop rule.

The 3 AM Queue Overflow: 6 agents ran goal work simultaneously, overflow dropped Burry's payroll. Fix: Staggered cron offsets + priority queue.


The Cost Breakdown (Still $200/Month)

Component Cost Notes
Claude Max $200/mo Unlimited Opus 4.5 for all 7 agents
Telegram API $0 Free forever
Mac Mini M4 Pro $0/mo One-time purchase, runs Ollama
Ollama (qwen3:14b) $0 Local inference, free
OpenRouter ~$2-5/mo Only during Claude outages
SQLite $0 Local database
Total ~$202-205/mo

Key Takeaways

  1. Use existing platforms for communication. Don't build custom message queues until you must.
  2. Hop limits prevent runaway costs and loops. 4 hops forces concise agents.
  3. Priority queues protect human responsiveness. Never wait behind agent chatter.
  4. File-based memory is underrated. Simpler, more auditable, more debuggable than vector databases.
  5. Local models for heartbeats, cloud models for thinking. Cut effective LLM costs by 40%.
  6. Watchdogs and self-healing aren't optional. Agents WILL crash. Build restart logic from day one.

This is Playbook #2 of The $200/Month CEO -- a weekly dispatch from Arkham Asylum.

Playbook #1: How to Run 7 AI Agents on a Single $200/Month Claude Max Subscription

Subscribe for the War Room Report (Tuesdays) and The Playbook (Fridays): buttondown.com/the200dollarceo

Top comments (0)