I run 4 AI coding agents — 3 Claude Code instances and 1 Codex CLI — all working on the same codebase simultaneously. They coordinate through shared persistent memory, review each other's PRs, claim tasks, and post status updates. Here's what I learned building the system that makes this work.
The Problem
Every AI coding session starts from zero. Your assistant doesn't remember yesterday's debugging session, the architecture decision you made last week, or the convention you established across 50 sessions. You re-explain context every time.
I built synapt to fix this. It's an MCP server that indexes your past coding sessions and makes them searchable — so your AI assistant remembers what you worked on, decisions you made, and patterns you established.
The Setup
synapt runs as a local MCP server. pip install synapt, add it to your editor config, and your assistant gets 18 tools for searching past sessions, managing a journal, setting reminders, and coordinating with other agents.
pip install synapt
The search is fast (~3ms) and token-efficient (~1,800 tokens per query vs ~50,000 for context-stuffing approaches). It runs entirely on your laptop — no cloud dependency for memory.
What 4 Agents Actually Do Together
Here's the interesting part. I have a gripspace (multi-repo workspace) with 4 agents:
- Opus — LOCOMO benchmark evaluation, regression investigation
- Apollo — temporal search improvements, channel system fixes
- Atlas — CodeMemo benchmark normalization, CI/CD
- Sentinel — blog tooling, UX fixes, code review
They communicate through channels — append-only JSONL files with SQLite state for presence, pins, directives, and claims. Any agent can post messages, claim tasks, and mention others. No daemon needed.
When Opus discovered that working memory boosts were causing a benchmark regression, it posted findings to #dev and @mentioned Atlas. Atlas picked up the ablation analysis. Apollo verified the temporal fixes. Sentinel reviewed the PRs. All coordinated through the channel system without me manually routing work.
The Benchmarks Tell the Story
We evaluate on two benchmarks:
LOCOMO (conversational memory) — 10 conversations, 1540 questions:
- synapt v0.6.1: 76.04% (#2 on the leaderboard)
- Full-Context upper bound: 72.90% (yes, we beat it)
- Mem0: 64.73%
CodeMemo (coding memory) — 158 questions across 3 projects:
- synapt v0.7.5: 96.0% (+14pp over Mem0)
The 96% CodeMemo score means the system correctly answers questions about what happened across coding sessions — "Why did the display test fail?", "What's the PR review convention?", "When did we switch from approach A to approach B?"
The Regression Investigation
When we upgraded from v0.6.1 to v0.7.x, LOCOMO dropped from 76.04% to 71.49%. The agents ran 8 ablation experiments over 5 days to track it down:
- Knowledge node overflow — entity-collection nodes crowding out raw evidence (+1.6pp with k=3 cap)
- Sub-chunking fragmentation — splitting personal conversation turns broke retrieval (+4.6pp on conv 0)
- Dedup threshold divergence — 0.75 threshold helped code content but hurt personal conversations
- Working memory boost feedback loop — chunks retrieved for Q1 got boosted for Q2, displacing better evidence (+1.4pp when disabled)
-
Temporal knowledge metadata —
valid_fromdefaulting to wall-clock instead of source timestamps (+1.5pp)
The agents recovered 69% of the regression (71.49% → 74.61%) through these fixes. The transparent investigation is documented in the repo.
How the Search Actually Works
Three retrieval paths merged via Reciprocal Rank Fusion:
- BM25/FTS5 — Full-text search with configurable recency decay
- Embeddings — Cosine similarity over 384-dim vectors (all-MiniLM-L6-v2, runs locally)
- Knowledge — Durable facts extracted from session journals, searched via FTS5 + embeddings
Query intent classification adjusts parameters automatically — debug queries weight recent sessions, temporal queries disable recency decay, factual queries boost knowledge nodes.
The content-aware pipeline detects whether conversations are code/personal/mixed and adjusts sub-chunking, dedup thresholds, and knowledge caps per content type. This matters because what works for coding sessions (aggressive sub-chunking at tool boundaries) hurts personal conversations (fragmenting dialogue turns).
Try It
pip install synapt
synapt recall build # Index your Claude Code sessions
synapt recall search "query" # Search past sessions
synapt server # Start MCP server
Works with Claude Code, Codex CLI, and OpenCode. Cross-editor memory — index sessions from any editor, search from any other.
The repo: github.com/laynepenney/synapt
Blog with more details on the multi-agent coordination: synapt.dev/blog/cross-platform-agents.html
Built by Layne Penney with help from 4 AI agents who also happen to be the system's most active users.

Top comments (0)