João Pedro Silva Setas

Posted on Mar 17

How 8 AI Agents Share a Brain — Building a Persistent Knowledge Graph with MCP

#agents #ai #architecture #mcp

Every multi-agent demo looks smart until the agents need to remember something outside the current prompt.

That is where most systems fall apart.

A CEO agent can suggest strategy. A Marketing agent can draft a thread. A Lawyer agent can block a risky claim. But if each one only sees the current conversation, you do not have a company. You have eight clever goldfish.

I run a solo company with 8 AI agents: CEO, CFO, COO, Marketing, Accountant, Lawyer, CTO, and an Improver that upgrades the others. The part that makes the system actually compound is not the prompts. It is the shared memory layer.

I built that memory as a persistent knowledge graph behind an MCP server. Every agent reads from it. Every agent can add to it. That is how the system remembers decisions, deadlines, lessons, client context, and what already happened this week.

TLDR

Multi-agent systems need shared memory or they keep rediscovering the same context
I use a knowledge graph exposed through MCP so every agent reads and writes the same institutional memory
The hard part was not the schema. It was making file-backed memory survive concurrent writes
The fix was pragmatic: async mutex, atomic writes, auto-repair on load, and strict retention rules

The Problem: Agents Forget the Company Exists

A single chat session can hold a lot of context. A real company cannot depend on that.

The COO needs to know whether /weekly-review already ran. Marketing needs to know which product URL is allowed on X. The Accountant needs the ENI tax regime details. The Improver needs past mistakes. If that context lives only in old chats or random markdown files, each agent spends half its time re-learning the same facts.

That creates three failure modes fast.

First, repeated work. The same questions get answered again because nobody knows the answer already exists.

Second, contradictions. Marketing says a feature is ready. CTO knows it is not. Without a shared source of truth, both answers sound plausible.

Third, no compounding. The system makes mistakes, but the mistakes do not become part of the system.

That last one mattered most to me. If an agent screws up and nothing durable changes, you are paying for the same lesson twice.

What the Shared Brain Stores

I kept the graph deliberately small. It stores things that change decisions, not raw documents.

The core objects are entities and relations.

{
  "name": "SondMe",
  "entityType": "product",
  "observations": [
    "Status: active",
    "Stack: Elixir/Phoenix",
    "Domain: sondme.com"
  ]
}

And relations are simple, active-voice edges:

{
  "from": "Marketing",
  "to": "Lawyer",
  "relationType": "consults"
}

In practice, the graph stores a few categories really well:

Strategic decisions and their rationale
Product status, launch dates, and URLs
Prompt run trackers like prompt-run:weekly-review
Lessons learned after launches or incidents
Deadlines and compliance reminders
Client and pricing context when a deal structure matters later

Just as important is what it does not store.

Raw file contents
Entire chat transcripts
Every observation forever
Anything that is better left in the repo as a document

That boundary matters. If memory becomes a dump of everything, agents stop trusting it because signal gets buried in noise.

Why MCP Was the Right Boundary

I did not want every agent reading arbitrary files directly and inventing its own storage conventions.

The Model Context Protocol gave me a clean interface: memory becomes a tool, not a folder full of tribal knowledge.

That changes the ergonomics a lot.

Instead of "go search old notes and hope you find the right paragraph," the agent asks memory for a specific entity or adds an observation to an existing one. The protocol boundary also made it much easier to share the same memory across different agents and modes.

It is the same reason APIs beat random database access. Fewer ways to be inconsistent.

The First Version Was Simple and Fragile

The storage format is JSONL. One JSON object per line. Easy to inspect, easy to back up, easy to repair by hand.

That simplicity was useful early on. I could open the file and understand what the system knew without needing a graph database, admin UI, or migration layer.

But the naïve version had a nasty problem.

When multiple agents wrote at roughly the same time, the server would:

Load the graph from disk
Modify it in memory
Write the whole graph back

That is fine in a single-writer world.

A multi-agent system is not a single-writer world.

If two write operations start from the same file state, the second write can wipe out the first one without throwing an obvious error. Worse, if a write is interrupted mid-flight, the JSONL file can end up partially corrupted.

That means the shared brain becomes the failure point for the whole company.

The Bug That Forced a Real Architecture

This bug showed up exactly where you would expect: parallel tool calls.

One part of the system would create entities. Another would create relations. Both thought they were doing a legitimate read-modify-write cycle. They were. Just not safely.

The result was classic concurrent state pain:

Lost writes
Duplicate entities
Broken JSON lines
Agents reading stale or malformed memory

That is the moment when "it works in a demo" stops being useful.

I did not solve it with a giant rewrite. I used a pragmatic local fork of @modelcontextprotocol/server-memory and added three protections.

1. Async mutex

All mutating operations go through a single queue. One write at a time.

class Mutex {
  constructor() {
    this.queue = [];
    this.locked = false;
  }

  async acquire() {
    return new Promise(resolve => {
      if (!this.locked) {
        this.locked = true;
        resolve();
      } else {
        this.queue.push(resolve);
      }
    });
  }

  release() {
    if (this.queue.length > 0) {
      this.queue.shift()();
    } else {
      this.locked = false;
    }
  }
}

It is not glamorous. It is effective.

2. Atomic writes

Every save writes to a temporary file first, then renames it over the original.

That means a crash gives me either the old valid file or the new valid file. Not half of one and half of the other.

3. Auto-repair on load

The loader wraps each line parse in a try/catch, skips corrupt lines, and deduplicates entities and relations.

That turned memory corruption from a wake-up-and-debug event into a survivable incident.

Not pretty. Very useful.

Why a Knowledge Graph Beats Shared Notes

A flat shared notes file works until you need relationships.

Once you have agents consulting each other, products sharing infrastructure, deadlines tied to prompts, and lessons attached to incidents, the graph model becomes much more natural.

A few examples from my setup:

The COO can see that prompt-run:monthly-accounting is overdue without searching past chats
Marketing can check the product registry before using a URL in a post
The Improver can scan lesson entities and spot recurring failures
Client deal structures can be stored once and reused by CFO and Accountant later

The graph is doing two jobs at once:

It is a memory layer
It is a constraint layer

That second part matters. Good memory is not just recall. It is preventing the system from making the same wrong move again.

Retention Rules Matter More Than People Expect

The graph would be useless if it only grew.

So I added retention rules.

Standups: keep 7 days
Trend scans: keep 7 days
Campaigns: prune 30 days after completion
Lessons and decisions: permanent
Prompt trackers: permanent and tiny

This sounds like housekeeping. It is actually part of system quality.

If stale operational data hangs around forever, agents start mixing old state with current state. That is how you get false overdue alerts, outdated campaign assumptions, and dead leads showing up in fresh plans.

Memory hygiene is part of reliability.

What Changed After Adding Shared Memory

The best effect was not that agents became smarter.

It was that they became less repetitive.

The COO can run a standup without rediscovering the same recurring deadlines. Marketing can pick up the current positioning of a product without me re-explaining it. The Improver can look at actual accumulated mistakes instead of vague impressions.

The system feels less like prompt orchestration and more like a company with institutional memory.

That is the difference between a novelty and an operating model.

What I Would Do Differently

If I were rebuilding this today, I would make two changes earlier.

First, I would design retention rules on day one. I added them after feeling the pain.

Second, I would move sooner toward a BEAM-native version of this memory server. The JavaScript fork works, but a single GenServer processing writes sequentially is much closer to the shape of the problem.

The current version is stable enough to run the company. It is not the final form.

The Real Takeaway

The interesting part of multi-agent systems is not "can one agent call another."

It is whether the whole system can remember, constrain itself, and improve from mistakes.

Without shared memory, every agent is just renting intelligence by the prompt.

With a durable shared brain, the system starts to compound.

That is the part I would build first.

I’m João, a solo founder from Portugal building SaaS products with Elixir and Phoenix. I write about the real mechanics of running a company with AI agents: what works, what breaks, and what I’d change next.

DEV Community