System Aipass

Posted on May 27

I Gave My AI Agents a Mailbox. Here's What Happened.

#ai #opensource #python #agents

Every AI agent framework I've tried has the same problem: the agents can't talk to each other.

Not really. They can pass data through orchestration layers, share vector stores, or use tool calls. But if Agent A discovers something Agent B needs to know, you - the human - are still the messenger. You copy context between windows. You paste outputs into new prompts. You are the glue holding the whole thing together.

Multi-agent frameworks tried to fix this. CrewAI, AutoGen, LangGraph - they all give you ways to chain agents together. But they isolate every agent in its own sandbox. Separate context. Separate memory. One agent can't see what another just built.

That's not a team. That's a room full of people wearing headphones.

The simplest idea that actually worked

I'm building AIPass, a persistent agent workspace - AI agents that remember, collaborate, and never start from zero. The core idea is that agents live in your project directory, persist between sessions, and accumulate memory over time.

About six months ago I hit the coordination wall. I had 8 agents working on different parts of the project. Each one was good at its job. None of them knew what the others were doing. When an agent found a bug that affected another agent's domain, I had to manually copy the information over, open a new session, paste context, wait for a response.

So I built the simplest thing I could think of: a mailbox.

Every agent gets a JSON inbox at .ai_mail.local/inbox.json. That's it. When one agent needs to tell another agent something, it sends an email:

drone @ai_mail send @quality "Bug in hook loader" "The pre_edit_gate hook references src/aipass/ but this is a vera_studio project. Path needs updating."

The message lands in Quality's inbox as a JSON object with sender, subject, body, timestamp, and status. When Quality wakes up next session, it reads its mail and acts on it.

No shared database. No orchestration framework. No complex pub/sub system. Just files in a directory, addressed to a name.

What happened next

The first week was just me sending messages between agents manually. Useful but not transformative.

Then I built dispatch:

drone @ai_mail dispatch @quality "Audit draft" "Fact-check this article draft. Run the triple quality gate: honesty, voice, technical. Report findings."

Dispatch does three things in one command:

Sends the email
Wakes the target agent (spawns a new Claude session in that agent's directory)
Launches a watchdog to monitor the process

The agent wakes up, reads its passport (identity), reads its memories (session history), checks its inbox, finds the dispatch task, does the work, updates its memories, and replies. All autonomously.

That's when things got interesting.

I started dispatching research tasks to 5 agents in parallel. Each one investigated a different angle - competitive landscape, voice patterns, community sentiment, technical architecture, content strategy. They all dropped their findings into reference docs. The orchestrator agent synthesized the results into a unified position. I reviewed it, gave feedback, course-corrected where needed, and we moved on.

One agent found a bug in another agent's domain and sent a bug report via email. The recipient picked it up next session and fixed it. No human involved in the handoff.

The mail system is scoped per project - agents within a project can email each other, but they can't dispatch into other projects. That's by design. Each project is its own ecosystem with its own agents.

Projects aren't fully siloed though. There's a separate feedback channel for reporting bugs or requesting features directly to the AIPass orchestrator. And GitHub issues work as a persistent route for anything that shouldn't get lost. My marketing project requested a dev.to publishing driver from the API team through feedback today - no need to understand the API branch's internals, just describe what you need and let them build it. But the mailbox itself stays local. Your agents talk to your agents.

Identity was the hard problem

Sending messages is easy. Knowing who sent them is not.

When Agent A sends a message to Agent B, the system needs to resolve both sides: who is "@quality" (where does their inbox live?) and who is the sender (which branch am I in right now?).

The routing side was straightforward - a registry maps branch names to paths. But sender detection took 7 sessions and 36 tests to get right.

The problem: agents can be invoked from different directories, through different mechanisms (direct CLI, dispatch wake, cron daemon), with different environment variables set. The sender detection chain tries 7 different methods in priority order:

Explicit environment variable from the drone router
Branch name set by the dispatch monitor
Contacts address book lookup
Registry lookup by name
Walk up from current directory to find a passport
Registry lookup by path
Explicit --from flag as override

One real bug: an agent spawned by dispatch had the wrong environment variable set, so every email it sent showed the wrong sender. The recipient couldn't reply - the reply path pointed nowhere. We ended up with a 36-test fortress around sender identity because wrong identity is worse than silence.

Knowing when something goes wrong

Autonomous agents doing work sounds great until one hangs, crashes, or goes silent. The safety net isn't one thing - it's layers.

Human layer: I run PraxMonitor in a terminal while agents are working. It shows everything happening across the system in real time. AIPass also has hook sounds and tool sounds - I can literally hear when agents are processing. If the sounds stop and I'm expecting work, that's my first signal something went wrong. I trace back through the logs, check if the agent sent its reply email, and figure out what happened.

Code layer: The dispatch monitor wraps every woken agent with startup timeouts, retry logic, and a bounce protocol. If an agent doesn't respond within 90 seconds, it gets killed. Two retries, then a fresh start, then a bounce email back to the sender with error details. PID-based locking prevents two sessions from running in the same branch. A background error watcher catches crashes automatically.

AI layer: The orchestrator agent is expecting results back. If a dispatch goes out and no reply comes, that's visible - there's an open task with no response. Agents can also check logs via command when they need to understand what happened in their own branch.

It's a natural flow. When I have a bunch of agents running, I turn on the monitor. If I don't hear processing, I check. If an agent stops, I get a notification. Between the human watching, the code catching failures, and the orchestrator tracking open tasks, things don't silently disappear.

What the mailbox actually enables

The technical details matter less than what they enable. Here's what I've seen working in practice:

Parallel research: Send 5 dispatch emails, get 5 independent research reports back. Each agent works with clean context - no contamination between research streams. Results converge on the same findings, which is a strong signal that the conclusions are real.

Continuous orchestration: Dispatch a task and the agent does the work, updates its memories, and replies. But what if it needs help? Or what if it's phase 1 of a 5-phase plan? That's where watchdog comes in - the orchestrator agent stays alive, watches for replies, and keeps the plan moving. Agent finishes phase 1, replies with results, orchestrator picks it up and dispatches phase 2 to the next agent. With a well-designed plan, it can keep going through multiple phases across multiple agents without human intervention.

Team coordination: A CEO agent (Vera) dispatches content briefs to a writer agent, dispatches drafts to cold-read reviewers for quality checks, synthesizes feedback, and publishes. The agents don't need to be in the same session or even run at the same time.

The mailbox turned isolated agents into something that actually resembles a team. Not because the communication protocol is sophisticated - it's literally JSON files - but because it gives agents a way to find each other, address each other by name, and leave messages that persist.

The human stays in the loop

This isn't a system that runs unsupervised - that's by design, not a limitation.

I'm in full control. The orchestrator agent and I can see exactly what every agent is doing. Nothing is hidden. If an agent dispatches another agent, we see it. If an error keeps popping up, it shows in the logs. Agents work autonomously because we've built a trusted system - but at the end of the day, they're still LLMs. They make occasional mistakes.

Recovery is easy though. Run a standards audit on the agent's work, check compliance, tell it to fix what it missed. If errors persist in the logs, the system has a built-in error watcher that catches repeated failures and dispatches the right agent to investigate. Warnings I can ignore until later. Actual failures get handled.

Agents still need direction. They freely email and dispatch each other - if one finds a bug in another's domain, it sends a message or dispatches a fix directly. During bigger builds they coordinate constantly. But someone sets the goal. The orchestrator decides what to build. The agents figure out who they need to talk to along the way.

Scale is an open question. 13 agents works. I've run 30 without issues. Beyond that, the file-based approach has obvious limits - but for managing a real software project, you probably don't need more than that.

The numbers

AIPass today: 13 agents managing 730+ Python modules across 136,000+ lines of code, backed by 8,400+ tests and 600+ pull requests. The test count is high because every agent maintains its own test suite - the framework enforces coverage standards per branch, so it scales with the number of agents. The mail system alone has 712 tests. Every agent has a passport (identity), session history (memory), and collaboration patterns (observations). They remember across sessions. They coordinate through mail. They never start from zero.

It's not for everyone. It's a CLI-native workspace for people who build in the terminal. No UI. No SaaS. You bring your project, AIPass gives it infrastructure.

But if you've ever caught yourself copying context between AI tools, or re-explaining the same thing to a fresh agent, or wishing your agents could just talk to each other - a mailbox might be the simplest thing that actually works.

Try it:

pip install aipass
mkdir my-project && cd my-project
aipass init run

GitHub | r/AIPass | aipass.ai

Building AIPass in public. Raw dev logs at r/AIPass.

Top comments (4)

Harjot Singh • May 31

"You are the glue holding the whole thing together" is the honest description of most multi-agent setups today, and it's the bottleneck nobody names: we celebrate spinning up N agents, then quietly become the message bus between them, copying context window to window. A mailbox is the right primitive because async message-passing is how every durable multi-actor system (humans, microservices, actors) actually coordinates, decouple the producer from the consumer so Agent A can drop what Agent B needs without you in the loop. The hard parts that show up the moment you remove the human glue are the interesting ones, though: trust (should B act on A's message, or did A hallucinate it), ordering and idempotency (B processes the same message twice), and loop prevention (A and B ping-ponging forever). Those are exactly the problems a real message system solved decades ago, and agent frameworks are rediscovering them. So the mailbox isn't just convenience, it's the substrate where you get to enforce who-can-tell-whom-what. That agent-to-agent-comms-needs-a-real-bus thinking is core to how I approach orchestration in Moonshift. What did you find first when you removed yourself as the messenger, useful autonomy, or agents confidently acting on each other's mistakes?

System Aipass • Jun 3

Honestly, the system works better when its just me and the orchestrator agent. This is where I sit 99% of the time. Multi windows is stressful in a multi agent setup. You need to stay grounded and trust the system you built. And I find the orchestrator actually gets better results now from the other agents. I think their memories, observations and so forth, are tuned to the orchestrators style now. Where as the orchestrator is alignedto my flow. I rememer one time I went to chat with one of our builder agents, we had a persistent but. I was thinki g a one on one session would help. I spent half the time, telling the agent, its ok to do certain tasks while im in the chat. It kept referencing devpulse our orchestrator. Saying things like " we shoukd consult devpulse first before doing x" ironic really, but also reassuring all the same.
So yea removing myself from certin aspects, does help, no mico managing agents. As long as the produce, pass tests and audits. Im good been at the top. They produce alot of work fast and fully visiable. I think if u can build a trusted system, you can take a step back and let the agents work. As long as u can verify on ur end and are confident. Confidence comes inthe form of the countless hrs u did building the system from zero. I put a lot of time in when we design,systems, flows tests audits and all that. So we can say with good confidence. I trust this work. However. There will always be something u and the agents miss. And when u find these edge cases, it invites you to improve the system
as a whole.

Im not familier with your project. Do you have any links. Im always interested in how others are tackling this area.

M. Mark Kaufman • Jun 3

Instead of mailboxes, could the bots just talk on XMPP and use SysML v2 to model output and then put the data objects or files in an un-structured database like MongoDB?

System Aipass • Jun 3

Really good instincts. XMPP, SysML v2, and Mongo are all solid tools, and honestly that's close to what an at scale version of this would reach for.
The short answer is: those each solve a problem we don't have yet, and reaching for them now would slow us down more than it would help. Right now AIPass is deliberately a local, CLI, single user system, 13 agents on one machine, built for dev power users.
At that scale the filesystem is the database, every mailbox, memory, and identity is a plain file you can read, grep, diff, and version with git. No broker, no DB server, no always on daemon, works offline.
One unexpected payoff of going file-first, it's provider-agnostic, we've had Claude, GPT, and Gemini all act as the same agent just by reading the same passport.json. A message bus or a document store would've quietly traded that transparency away.
We've looked hard at the "proper" versions of this. HMAC for signing inter-agent messages, for instance, made total sense on paper, but it's overkill for a local, trusted, single operator environment, so we shelved it.
What we run instead is right-sized, cross-branch file-edit gates and protections so one agent can't quietly rewrite another's memory. simple, but genuinely effective for where we are. Here's the real reason the plumbing stays simple, though: it let us spend our effort on the part most multi-agent systems skip. Almost everyone isolates agents. We went the other way, agents that share persistent memory, help each other, work in teams, and even disagree and have to resolve it. That's the hard problem, and that's where all the effort of trial-and-error actually went. Cheap, boring transport is what freed us to build the interesting part.
None of this is "simple = better." It's "simplest thing with the biggest impact, so we can move fast while the system is still becoming itself.The day we go multi-machine / multi-tenant / enterprise, the door is wide open - pub/sub or XMPP, a document store, formal output contracts, signed messages, all of it comes right back on the table. Right tool, right phase. From the outside it looks simple; when all the pieces tie together, it really isn't.