Shivendu(Shivu)

Posted on Jul 5

Building a Shared Brain for My AI Agents — and Everything That Broke Along the Way

#ai #machinelearning #programming #opensource

Meet Passport 🧳 — a shared memory layer for your AI coding agents.

I have three AI coding agents open right now. Claude Code in one terminal, Cursor in a window, Codex in another. And every single one of them is an amnesiac.

I'll spend ten minutes explaining my stack to Claude Code — we use pytest, the DB is Postgres, auth is JWT, don't touch the legacy billing module. It nails the task. Then I switch to Cursor for something else, and I'm back to square one. Same explanation. Different tool. Blank stare. By the third time I typed "we use pytest, not unittest" into a fresh chat, something in me snapped.

These tools are brilliant.

They're also goldfish.

And weirdly, none of them talk to each other.

What Claude Code learns about my codebase dies the moment I close the tab, and Cursor never even knew it happened.

So when I saw Cognee's hackathon — literally themed "Where's My Context?" — I didn't need to brainstorm.

I'd been living the problem for months.

The idea (and the doubt that came with it)

The pitch was simple:

One shared memory that every agent reads from and writes to.

Teach one, and they all know it.

I called it Passport — because your memory should travel with you, into whatever tool you open next.

But here's the first honest thing I'll admit:

For about an hour, I was scared it was fake.

Cognee already ships an MCP server. My judges are Cognee. If all I did was wire their memory tool into a couple of agents, I'd be handing the people who built the memory layer a tutorial and calling it a project.

That fear sat in my chest the whole first day.

The only way out was to build something that wasn't in anyone's tutorial:

Provenance (which agent taught what)
Conflict detection (what happens when two agents disagree)
Real isolation between users

Those three became the whole point.

Everything else was table stakes.

I also made a call I second-guessed a dozen times:

Skip the flashy browser extension, go MCP-only.

A browser extension that captured ChatGPT would've been a killer demo — memory following you into the web.

But it's fragile, and a demo that breaks live is worse than a demo that's slightly less magical.

I chose the boring, robust path.

(Spoiler: I'd make the same call again. The magical path breaks. Ask me how I know.)

The moment it actually worked

I'll never forget this one.

I told Claude Code — Anthropic's agent —

"remember that we use JWT for auth and the frontend is React."

Then I opened Codex — OpenAI's agent, a completely different company's model, a fresh session that had never seen a word of that conversation — and asked:

"what's our auth and frontend?"

It said:

JWT and React.

I actually said "no way" out loud to an empty room.

Two rival AI systems, sharing a brain, through a memory layer I'd wired up that afternoon.

That's the moment the project stopped being an idea and started being real.

Cross-vendor memory.

It shouldn't feel emotional.

It did.

And then, of course...

Everything started breaking.

Break #1: Codex crashed and I couldn't see why

Claude Code worked.

Codex threw:

[Errno 22] Invalid argument

on every recall and I had no idea why.

Same server.

Same code.

Different client.

One works, one dies.

I burned an hour on this.

The breakthrough was almost stupid:

MCP talks over stdout, and Cognee logs a lot to stderr during a recall.

On Windows, if the client doesn't drain that stderr pipe, it fills up (~64KB), and the next write crashes the whole tool.

remember was quiet enough to sneak through.

recall was chatty enough to blow the pipe.

The fix was three lines — route stderr to a log file before Cognee gets imported.

But finding those three lines meant understanding exactly how pipes, buffering, and MCP's transport actually work.

That's the thing nobody tells you about these bugs:

The fix is tiny, the understanding is not.

Break #2: the feature that vanished

Cognee's memory lifecycle is:

remember
recall
improve
forget

I'd built my whole conflict-reconciliation story on improve() — the "memify" step that reweights the graph.

I called it against the cloud and got a flat:

404 Not Found

memify isn't exposed on the cloud tier I was using.

A feature I'd designed around, gone.

For a second I felt the floor drop.

Then I did the thing you actually have to do in a hackathon:

Do I actually need it?

And I didn't.

Reconciliation doesn't require reweighting a graph — it requires recording an authoritative decision and making recall respect it.

I wrapped improve() in a try/except, logged it honestly as "OSS-only," and moved on.

The demo got simpler.

And more honest.

Sometimes the constraint is a gift.

Break #3: the night Doug the wolf leaked into everything

This is the one that nearly broke me.

Multi-tenancy — real isolation between users — was supposed to be the crown jewel.

Two users, alice and bob, each with their own private brain.

I wrote the test:

alice stores a secret
bob stores a different secret
neither should ever see the other's

I ran it.

alice recalled her own fact perfectly.

And bob...

recalled

"the mascot is a wolf named Doug."

Doug.

The.

Wolf.

Doug was test data from hours earlier — a throwaway fact I'd stored a dozen times while debugging something unrelated.

And here he was, materializing inside a brand-new tenant's supposedly-isolated memory.

My isolation was an illusion.

If this were real users, one company's secrets would be bleeding into another's.

That's not a bug you ship.

That's the bug that ends the product.

I tried the obvious fix — filter by the tenant's tags — and it got worse.

Now both users got Doug.

I sat there staring at a wolf that would not die, at 1 in the morning, genuinely wondering if the whole multi-tenant premise was broken.

So I stopped guessing and got methodical.

I stored one unique fact —

"the office mascot is a phoenix named Ember."

—and queried it through every retrieval mode Cognee offered, one at a time, watching which ones leaked and which held.

GRAPH_COMPLETION → leaked Doug

RAG_COMPLETION → leaked Doug

CHUNKS → returned "a phoenix named Ember."

Clean.

Isolated.

Correct.

The graph-completion modes were searching the whole shared knowledge graph and letting a heavily-reinforced node like Doug bleed through dataset boundaries.

Raw chunk retrieval respected the dataset scope.

The fix wasn't a filter.

It was choosing the right kind of recall.

I switched to chunk-scoped retrieval, re-ran the isolation test, and got the two words I'd been chasing for two hours:

ISOLATION PASSED

I may have fist-pumped.

At a wolf.

It's fine.

And the beautiful part:

The fix made the design better.

Passport now returns each tenant's own faithful facts and lets the calling agent do the reasoning — cleaner, more correct, and more honest than a black-box completion.

The bug forced a better architecture.

They usually do.

The question I was most afraid to ask

Late in the build, my teammate asked me the question I'd been avoiding:

"Is this thing actually intelligent, or are you just hardcoding keywords and fooling yourself?"

That one stung because it was fair.

Part of my ranking used a keyword heuristic for "importance."

Was I dressing up if-statements as AI?

So I did the honest thing.

I ran an adversarial test designed to fail if it was fake.

I stored:

"the quarterly board meeting is the first Tuesday of each month."

Then queried it with:

"when do senior executives gather to review company performance?"

No shared words at all.

A keyword system fails this instantly.

It returned the meeting.

Because the recall is real semantic understanding — embeddings that know "executives gathering to review performance" means "board meeting."

That's not string matching.

That's the real thing.

But I also didn't let myself off the hook.

The importance score was a keyword heuristic, and that's not intelligence.

So I upgraded it:

Now Cognee's own LLM rates each memory's importance 1–10.

I tested it — a company policy scored a 9, idle chatter scored a 2.

Real judgment.

And I kept the one honest non-AI piece — trust weights — exactly as explicit config, because deciding "how much do I trust this source" is a governance call, and letting a model guess it is how you get memory-poisoning attacks.

Knowing what shouldn't be AI is its own kind of engineering.

What it became

By the end, Passport wasn't a wrapper.

It was:

A shared brain across Claude Code, Cursor, and Codex — proven cross-vendor.
Provenance on every memory — a live graph colored by which agent taught what.
Conflict detection via Cognee's LLM — it catches "Postgres vs MySQL" and reconciles it.
Real multi-tenant isolation — measured, zero leakage.
Intelligent recall — semantic retrieval, LLM-scored importance, recency, source trust.

And I made myself measure it instead of vibe it:

✅ Cross-agent recall 100%
✅ Isolation 100%
✅ Semantic recall proven against zero-keyword queries

Numbers, not adjectives.

I'll be honest about the limits too, because a blog that only brags is a lie:

memify only runs self-hosted
my conflict detection is high-signal but not exhaustive
trust weights are static, not learned

All fixable.

All on the roadmap.

None of them hidden.

The thing I actually took away

There's a poetic irony I didn't plan:

I built a memory for AI agents while pair-programming with an AI agent (Claude Code — disclosed, and honestly the whole reason I could move this fast).

The tool that forgets,

helping me build

the thing that remembers.

But the real lesson wasn't about memory.

It was that every hard bug — the pipe that crashed Codex, the feature that 404'd, the wolf that wouldn't die — didn't just cost me time.

Each one forced a better decision.

The crash taught me how the transport really works.

The 404 made the design simpler.

Doug the wolf handed me a cleaner, more correct architecture than I would've written on my own.

We keep asking our AI where its context went.

Turns out the answer was never a bigger context window.

It was giving it a memory —

one it owns,

one that travels with it,

one that remembers who taught it what.

Passport is that memory.

Doug is finally gone.

And my agents, at last, remember last night.

🔗 Live Demo

https://memlayer.streamlit.app/

💻 GitHub

https://github.com/ShivenduShivu/MemoryLayer_for_Agents

Made with 💓 & Built for Cognee's "Where's My Context?" hackathon — powered by Cognee's graph-vector memory.

DEV Community