varun pratap Bhardwaj

Posted on Apr 8

I built a peer-to-peer communication layer for AI coding agents — here's how it works

#agents #ai #mcp #showdev

I run 3-4 AI coding sessions in parallel. Claude Code in VS Code for the frontend, another Claude session in the terminal for backend, sometimes Cursor or Antigravity for a third workstream.

The biggest pain point? They're completely isolated.

Session A refactors the authentication module. Session B starts editing the same file because it doesn't know Session A is working on it. I become the message bus — copy-pasting context between terminals like it's 2005.

This isn't unique to Claude Code or Cursor. Every AI coding agent has this problem. The Model Context Protocol (MCP) gives agents tools, but no way to coordinate with other agents on the same machine.

The Solution: SLM Mesh

SLM Mesh is an open-source MCP server that gives AI coding agents 8 tools for peer-to-peer communication:

Peer Discovery — agents auto-detect each other (scope by machine, directory, or git repo)
Direct Messaging — send structured messages between specific sessions
Broadcast — one-to-all message delivery
Shared State — key-value scratchpad accessible by all peers
File Locking — advisory locks with auto-expire to prevent edit conflicts
Event Bus — subscribe to peer_joined, state_changed, file_locked events
Summary — each agent announces what it's working on
Status — broker health and mesh statistics

Quick Start

# Install
npm install -g slm-mesh

# Add to Claude Code
claude mcp add --scope user slm-mesh -- npx slm-mesh

# Add to Cursor / VS Code / Windsurf
# mcp.json:
{
  "mcpServers": {
    "slm-mesh": {
      "command": "npx",
      "args": ["slm-mesh"]
    }
  }
}

That's it. Open two sessions and ask one of them to "check mesh_peers" — it will see the other.

Architecture

┌─────────────────────────────────────────────────┐
│                  Your Machine                    │
│                                                  │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐    │
│  │ Claude    │   │ Cursor   │   │ Aider    │    │
│  │ Code      │   │          │   │          │    │
│  └─────┬────┘   └─────┬────┘   └─────┬────┘    │
│        │              │              │           │
│  ┌─────┴────┐   ┌─────┴────┐   ┌─────┴────┐    │
│  │ MCP      │   │ MCP      │   │ MCP      │    │
│  │ Server   │   │ Server   │   │ Server   │    │
│  └─────┬────┘   └─────┬────┘   └─────┬────┘    │
│        │              │              │           │
│        └──────────────┼──────────────┘           │
│                       │                          │
│              ┌────────┴────────┐                 │
│              │ SLM Mesh Broker │                 │
│              │ localhost:7899  │                 │
│              │ SQLite + UDS    │                 │
│              └─────────────────┘                 │
└─────────────────────────────────────────────────┘

Key design decisions:

Auto-lifecycle: First MCP server auto-starts the broker. Last peer leaves, broker shuts down after 60s. No daemon to manage.
SQLite + WAL: Concurrent reads, single writer, crash-safe. Messages and events auto-pruned after 24/48 hours.
Unix Domain Sockets: Real-time push delivery in <100ms. No polling.
Bearer token auth: Random 32-byte token per broker session. No dangerous flags needed.
Agent-agnostic: Works with any MCP client. Auto-detects Claude Code, Cursor, Aider, Codex, Windsurf, VS Code.

Real-World Workflow

Here's what I actually do with it daily:

Morning (3 sessions):

Session 1 (VS Code): "I'm refactoring the auth module"
→ Sets summary via mesh_summary
→ Locks auth.ts via mesh_lock

Session 2 (Terminal): "What are the other sessions doing?"
→ Calls mesh_peers — sees Session 1 is on auth
→ Calls mesh_lock query auth.ts — sees it's locked
→ Works on database instead

Session 3 (Antigravity): Starts a migration
→ Broadcasts "database schema changing to v2.1" via mesh_send
→ Sets db_version = 2.1 in shared state via mesh_state
→ Sessions 1 and 2 see the update

No copy-pasting. No context switching. The agents coordinate themselves.

The Numbers

Metric	Value
Tests	480 passing
Coverage	100% lines
MCP tools	8
CLI commands	12
Dependencies	4 (MCP SDK, better-sqlite3, commander, zod)
Python client	Zero deps (stdlib only)
Install size	~80 KB (packed)

Comparison with claude-peers

claude-peers proved the demand for this — 1,600 stars in 2 weeks. SLM Mesh is the production-grade answer:

Feature	SLM Mesh	claude-peers
MCP tools	8	4
File locking	Yes	No
Shared state	Yes	No
Event bus	Yes	No
Agent-agnostic	Any MCP agent	Claude only
Dangerous flags	Not needed	Required
Tests	480 (100% cov)	0
Auth	Bearer token	None
Python client	Yes	No

Try It

npm install -g slm-mesh

GitHub: github.com/qualixar/slm-mesh
PyPI: pip install slm-mesh

MIT licensed. Part of the Qualixar research initiative by Varun Pratap Bhardwaj.

Feedback welcome — especially interested in what multi-session workflows you'd use this for.

Top comments (1)

Kyle Carriedo • May 19

"I become the message bus — copy-pasting context between terminals like it's 2005" is uncomfortably accurate. The MCP-shaped solution is the obvious place to put this; the deeper question I've been chewing on is what the protocol over that layer should look like, because peer-to-peer messaging alone doesn't fix the coordination problem — it just makes it possible to coordinate.

A few design tensions worth flagging from running similar setups:

Pull vs. push. If session B has to actively poll a known channel to discover that session A is touching auth.ts, you've moved the burden from "human as message bus" to "agent prompt as poller," which works until you forget to include the polling instruction in one of your subagent briefs. Push (agent A broadcasts "I'm taking auth.ts" before editing) is more robust but requires every agent to opt into emitting those broadcasts.
Granularity of lock. File-level claims (your example) catch the most painful conflicts but miss semantic ones — two agents independently designing incompatible APIs in separate files. Some kind of intent-level broadcast ("I'm implementing user-auth, here's the contract") is more useful but harder to encode.
Stale claim cleanup. When an agent dies or compacts mid-task, what releases its claims? Most setups hand-wave this; a TTL on claims is the simplest answer but the right TTL depends on task type.

Curious whether SLM Mesh's design leans more toward channel-based pub/sub or shared mutable state — they have pretty different failure modes when an agent disappears mid-conversation.