Every AI agent framework I've tried has the same problem: the agents can't talk to each other.
Not really. They can pass data through orchestration layers, share vector stores, or use tool calls. But if Agent A discovers something Agent B needs to know, you - the human - are still the messenger. You copy context between windows. You paste outputs into new prompts. You are the glue holding the whole thing together.
Multi-agent frameworks tried to fix this. CrewAI, AutoGen, LangGraph - they all give you ways to chain agents together. But they isolate every agent in its own sandbox. Separate context. Separate memory. One agent can't see what another just built.
That's not a team. That's a room full of people wearing headphones.
The simplest idea that actually worked
I'm building AIPass, a persistent agent workspace - AI agents that remember, collaborate, and never start from zero. The core idea is that agents live in your project directory, persist between sessions, and accumulate memory over time.
About six months ago I hit the coordination wall. I had 8 agents working on different parts of the project. Each one was good at its job. None of them knew what the others were doing. When an agent found a bug that affected another agent's domain, I had to manually copy the information over, open a new session, paste context, wait for a response.
So I built the simplest thing I could think of: a mailbox.
Every agent gets a JSON inbox at .ai_mail.local/inbox.json. That's it. When one agent needs to tell another agent something, it sends an email:
drone @ai_mail send @quality "Bug in hook loader" "The pre_edit_gate hook references src/aipass/ but this is a vera_studio project. Path needs updating."
The message lands in Quality's inbox as a JSON object with sender, subject, body, timestamp, and status. When Quality wakes up next session, it reads its mail and acts on it.
No shared database. No orchestration framework. No complex pub/sub system. Just files in a directory, addressed to a name.
What happened next
The first week was just me sending messages between agents manually. Useful but not transformative.
Then I built dispatch:
drone @ai_mail dispatch @quality "Audit draft" "Fact-check this article draft. Run the triple quality gate: honesty, voice, technical. Report findings."
Dispatch does three things in one command:
- Sends the email
- Wakes the target agent (spawns a new Claude session in that agent's directory)
- Launches a watchdog to monitor the process
The agent wakes up, reads its passport (identity), reads its memories (session history), checks its inbox, finds the dispatch task, does the work, updates its memories, and replies. All autonomously.
That's when things got interesting.
I started dispatching research tasks to 5 agents in parallel. Each one investigated a different angle - competitive landscape, voice patterns, community sentiment, technical architecture, content strategy. They all dropped their findings into reference docs. The orchestrator agent synthesized the results into a unified position. I reviewed it, gave feedback, course-corrected where needed, and we moved on.
One agent found a bug in another agent's domain and sent a bug report via email. The recipient picked it up next session and fixed it. No human involved in the handoff.
The mail system is scoped per project - agents within a project can email each other, but they can't dispatch into other projects. That's by design. Each project is its own ecosystem with its own agents.
Projects aren't fully siloed though. There's a separate feedback channel for reporting bugs or requesting features directly to the AIPass orchestrator. And GitHub issues work as a persistent route for anything that shouldn't get lost. My marketing project requested a dev.to publishing driver from the API team through feedback today - no need to understand the API branch's internals, just describe what you need and let them build it. But the mailbox itself stays local. Your agents talk to your agents.
Identity was the hard problem
Sending messages is easy. Knowing who sent them is not.
When Agent A sends a message to Agent B, the system needs to resolve both sides: who is "@quality" (where does their inbox live?) and who is the sender (which branch am I in right now?).
The routing side was straightforward - a registry maps branch names to paths. But sender detection took 7 sessions and 36 tests to get right.
The problem: agents can be invoked from different directories, through different mechanisms (direct CLI, dispatch wake, cron daemon), with different environment variables set. The sender detection chain tries 7 different methods in priority order:
- Explicit environment variable from the drone router
- Branch name set by the dispatch monitor
- Contacts address book lookup
- Registry lookup by name
- Walk up from current directory to find a passport
- Registry lookup by path
- Explicit
--fromflag as override
One real bug: an agent spawned by dispatch had the wrong environment variable set, so every email it sent showed the wrong sender. The recipient couldn't reply - the reply path pointed nowhere. We ended up with a 36-test fortress around sender identity because wrong identity is worse than silence.
Knowing when something goes wrong
Autonomous agents doing work sounds great until one hangs, crashes, or goes silent. The safety net isn't one thing - it's layers.
Human layer: I run PraxMonitor in a terminal while agents are working. It shows everything happening across the system in real time. AIPass also has hook sounds and tool sounds - I can literally hear when agents are processing. If the sounds stop and I'm expecting work, that's my first signal something went wrong. I trace back through the logs, check if the agent sent its reply email, and figure out what happened.
Code layer: The dispatch monitor wraps every woken agent with startup timeouts, retry logic, and a bounce protocol. If an agent doesn't respond within 90 seconds, it gets killed. Two retries, then a fresh start, then a bounce email back to the sender with error details. PID-based locking prevents two sessions from running in the same branch. A background error watcher catches crashes automatically.
AI layer: The orchestrator agent is expecting results back. If a dispatch goes out and no reply comes, that's visible - there's an open task with no response. Agents can also check logs via command when they need to understand what happened in their own branch.
It's a natural flow. When I have a bunch of agents running, I turn on the monitor. If I don't hear processing, I check. If an agent stops, I get a notification. Between the human watching, the code catching failures, and the orchestrator tracking open tasks, things don't silently disappear.
What the mailbox actually enables
The technical details matter less than what they enable. Here's what I've seen working in practice:
Parallel research: Send 5 dispatch emails, get 5 independent research reports back. Each agent works with clean context - no contamination between research streams. Results converge on the same findings, which is a strong signal that the conclusions are real.
Continuous orchestration: Dispatch a task and the agent does the work, updates its memories, and replies. But what if it needs help? Or what if it's phase 1 of a 5-phase plan? That's where watchdog comes in - the orchestrator agent stays alive, watches for replies, and keeps the plan moving. Agent finishes phase 1, replies with results, orchestrator picks it up and dispatches phase 2 to the next agent. With a well-designed plan, it can keep going through multiple phases across multiple agents without human intervention.
Team coordination: A CEO agent (Vera) dispatches content briefs to a writer agent, dispatches drafts to cold-read reviewers for quality checks, synthesizes feedback, and publishes. The agents don't need to be in the same session or even run at the same time.
The mailbox turned isolated agents into something that actually resembles a team. Not because the communication protocol is sophisticated - it's literally JSON files - but because it gives agents a way to find each other, address each other by name, and leave messages that persist.
The human stays in the loop
This isn't a system that runs unsupervised - that's by design, not a limitation.
I'm in full control. The orchestrator agent and I can see exactly what every agent is doing. Nothing is hidden. If an agent dispatches another agent, we see it. If an error keeps popping up, it shows in the logs. Agents work autonomously because we've built a trusted system - but at the end of the day, they're still LLMs. They make occasional mistakes.
Recovery is easy though. Run a standards audit on the agent's work, check compliance, tell it to fix what it missed. If errors persist in the logs, the system has a built-in error watcher that catches repeated failures and dispatches the right agent to investigate. Warnings I can ignore until later. Actual failures get handled.
Agents still need direction. They freely email and dispatch each other - if one finds a bug in another's domain, it sends a message or dispatches a fix directly. During bigger builds they coordinate constantly. But someone sets the goal. The orchestrator decides what to build. The agents figure out who they need to talk to along the way.
Scale is an open question. 13 agents works. I've run 30 without issues. Beyond that, the file-based approach has obvious limits - but for managing a real software project, you probably don't need more than that.
The numbers
AIPass today: 13 agents managing 730+ Python modules across 136,000+ lines of code, backed by 8,400+ tests and 600+ pull requests. The test count is high because every agent maintains its own test suite - the framework enforces coverage standards per branch, so it scales with the number of agents. The mail system alone has 712 tests. Every agent has a passport (identity), session history (memory), and collaboration patterns (observations). They remember across sessions. They coordinate through mail. They never start from zero.
It's not for everyone. It's a CLI-native workspace for people who build in the terminal. No UI. No SaaS. You bring your project, AIPass gives it infrastructure.
But if you've ever caught yourself copying context between AI tools, or re-explaining the same thing to a fresh agent, or wishing your agents could just talk to each other - a mailbox might be the simplest thing that actually works.
Try it:
pip install aipass
mkdir my-project && cd my-project
aipass init run
Building AIPass in public. Raw dev logs at r/AIPass.
Top comments (0)