connor gallic

Posted on Apr 21 • Edited on Apr 23

19 Adapters, One SQLite File, 10 Days to Ship: Open Brane Is Public

#ai #memory #etl #mcp

The brain under my agent mesh — the thing Kai, Scout, Claude Code, and Codex query before they answer anything — is one SQLite table. Eight columns. 80-line Python write gate. No ORM, no workflow engine, no framework.

I committed the first architecture decision record on 2026-04-11. I pushed the public repo today, 2026-04-21. Ten days from first ADR to open source. In between, the event log grew from zero to 942,068 events.

Today it went live as Open Brane.

This post is why. The next post is how.

The Problem Every Agent User Has

You're using more tools than you can count. Calendar, Drive, Stripe, GitHub, Notion, Obsidian, Claude, ChatGPT, a transcription service, a CRM, maybe half a dozen automation platforms. Each one has your context — inside its own schema, inside its own database, behind its own API.

None of them can see each other.

You notice this every time an agent forgets something you told it on Tuesday. Or proposes a plan that contradicts a decision you made three weeks ago. Or generates a summary of your week that omits the five Stripe events and two Fathom calls that actually defined it.

The fix most products offer is memory features. ChatGPT lets you save facts. Claude has projects. Custom GPTs accept 8K of context. All workarounds. None address the real problem, which is that your context is a cross-source data layer problem, not a model problem.

The engineering challenge is making every source queryable through one interface, so any agent in your stack can pull the right slice without knowing which tool originally produced it.

That's what Open Brane is.

What Actually Breaks

I noticed three recurring failure modes before I wrote a line of code.

Context doesn't survive handoffs between agents. Scout finishes researching something. Kai takes over and needs to act on the research. The only way Kai learns what Scout found is if Scout produces a summary document Kai reads. If the summary misses a fact, it's gone. Agents don't pass context to each other cleanly because they're passing rendered views instead of source data.

Definitions drift across sources. Stripe's MRR calculation differs from the one in my Supabase analytics table. Both are "correct." Both are referenced in conversations. An agent answering a question about revenue has no way to know which number you want unless you tell it every time.

Pipelines fail silently. An ingestion script that pulled from Drive broke recently. Zero new events, zero error log. The pipeline was correctly reporting no work to do because its input was empty — not because no documents existed, but because auth had expired. I found it by noticing a gap in the event stream, not because anything alerted me.

All three trace back to the same root: the source data is scattered across systems that can't see each other, and the views I rely on are not rebuildable from a single canonical store.

The Minimum Viable Fix

I wrote the whole thing on paper before I wrote code. One rule governed every design decision:

One append-only table with one write path, and every view is rebuildable from it.

That's the entire brain. One SQLite file called events.db. One table called events. One column named payload_json holding an opaque blob, plus seven columns of indexable metadata — timestamp, source, type, actor, and a few others. No UPDATEs. No DELETEs. Corrections are new events that reference prior event IDs.

Every write goes through one Python script called record_event.py. 80 lines. No LLM touches the database directly. No LLM generates SQL. No ingest script opens a database connection — they all shell out to record_event.py as a subprocess.

That constraint is load-bearing. It means the write contract lives in one place. It means adapters are pure functions — pull from a source, produce event rows, exit. It means I can run any ingester on any machine and it cannot corrupt the brain.

The second load-bearing constraint is that views are rebuildable. The Obsidian vault, the Qdrant vector DB, the compiled journals, the wiki — all of them are views. None are canonical. Delete any of them and run rebuild and they come back from events. If Qdrant's disk dies I don't lose vectors; I lose a rebuild overnight.

The third constraint is that cron is the orchestrator. No queues, no dead letter queues, no workflow engine. Every adapter is idempotent — re-running it does nothing if nothing changed. Cron hits each one every N minutes. If the adapter fails, the next tick retries. health_check.py --record writes its own probe into the events table, so I can query my own uptime history using the same tools I query everything else with.

What's in There Right Now

942,068 events. 3 GB on disk. 855,672 distinct actors. 4,762 events ingested today while I was writing this post.

Source breakdown as of 2026-04-21:

gmail                  537,517   (Takeout archive)
facebook                85,900
chatgpt                 54,946
audible                 44,496
twitter                 34,994   (12-year takeout)
google-search           34,757   (search history takeout)
gdrive                  33,715   (941 docs + 32k other files)
claude-laptop           25,012   (Claude Code sessions)
web                     18,886   (120 RSS feeds + extracted pages)
linkedin                15,281
local-dev               12,772
youtube                  7,263
fitbit                   5,550
google-maps              4,159
kai                      3,554   (agent conversations)
chrome                   3,144
code                     3,101   (AST nodes)
amazon                   2,061
git                      1,904   (33 repos)
snapchat                 1,865
google-contacts-full     1,804
google-fit               1,550
google-access            1,160
haro                       917
gcal                       873
google-keep                821
scout                      584
gvoice                     549
openclaw                   525

Most of that is not "coding data." It's life-stream data — Audible listens, Amazon orders, Google Maps history, Snapchat. I include everything because at a million events the storage cost is a rounding error and I don't know in advance which slice will matter to a question. When Kai asks if I've been sleeping badly during a stressful build week, the answer is in the Fitbit slice. When I need to remember a contact I met on a flight three years ago, it's in google-contacts-full.

19 ingest scripts pull from 30+ distinct sources. Every one writes through the same record_event.py write gate. The database has never been corrupted. I've never had to run a migration.

Why Release It Now

Ten days is not "finished." Nobody's running Open Brane but me. The repo has one commit and zero stars as I write this sentence. I'm releasing it now because the pattern is already load-bearing in my daily workflow and waiting to polish wouldn't change what the pattern is.

Specifically — if you're running more than one agent against your own data, you've already half-built this, badly. You have a Notion page that an agent reads. You have a Claude project with some context. You have a local SQLite somewhere. You have Obsidian. You have a Google Doc with your todo list. Every time you ask an agent to do something, you're rebuilding the cross-source query manually by pasting from these into the prompt.

Open Brane is what that looks like when it's formalized.

The repo is intentionally small. One ARCHITECTURE doc, one SCHEMA, about a dozen scripts including three canonical adapters (Drive, Claude Code sessions, git), a stdio MCP server, a systemd unit template, and two docs pages on how to extend it. No framework, no ORM, no workflow engine. Fits on a USB stick.

It's opinionated in exactly the places that matter:

Append-only. Not tunable. No UPDATEs, no DELETEs.
One write path. Everything goes through record_event.py. Not tunable.
Scripts are pure functions. Read, compute, write, exit. No background workers. No state machines.
Cron is the orchestrator. Retries are automatic because next tick.
Network boundary is auth. Bind MCP to localhost or your tailnet. Don't build an auth layer you'll regret.
Views are rebuildable. If it's not rebuildable from events, it's not in the brain.

Everything else is configurable. Which sources you ingest, which embedding model you run, which MCP tools you expose, which vector DB you back it with.

What It Is Not

Open Brane is not a framework. It doesn't decide your ontology for you. It doesn't have an opinion about what counts as an event or how you name sources. You write the adapters. You choose the payload shape. You pick which sources matter.

It's also not a SaaS. It runs on your box. Your data never leaves unless you ship a view somewhere else on purpose. The MCP server binds to localhost by default. The embedding model (nomic-embed-text via Ollama) runs locally. There are no vendor API calls in the critical path — if OpenAI raised prices tomorrow, nothing in the brain would change.

It's also not trying to be general-purpose observability or a data warehouse. The schema is too narrow for either. It's specifically a personal agent memory substrate. If you're building a company data platform, you want something bigger.

What It Unlocks

Three things showed up the moment the brain had enough data in it.

Agents stopped losing Tuesday. Scout queries the same events Kai queries the same events Claude Code queries. When I tell any of them I shipped a fix to the butterfly pipeline on Tuesday, every other agent can find it by Wednesday. The model changes, the memory doesn't.

Content pipelines stopped being a grind. Every Claude Code session is a story — problem, attempts, decision, resolution. A nightly script mines sessions for high-signal events and flags ones worth writing about. The post you're reading was seeded by three events: the day I decided to open-source the brain, the day I hit the writes-only-through-scripts rule, and the day I realized cron was doing more orchestration than any workflow tool I'd used.

Definitions stopped drifting. Raw events go in. Derived metrics compute on read. If two agents report different MRR numbers, I diff the queries that produced them, not the numbers themselves.

How to Start

The Quickstart in the README gets you running in about fifteen minutes on Ubuntu. Install Ollama, pull nomic-embed-text, run Qdrant in Docker, clone the repo, initialize the database. First event written, first semantic search works.

From there, the pattern is: pick one source, copy an existing adapter, rewrite the fetch loop, add a cron line. About an hour per new source after you've done one. The canonical adapter to crib from is ingest_gdrive.py — it's the most complex one in the repo, so anything simpler is a strict subset.

The next post in this series walks through the schema, the MCP surface, and the adapter pattern in detail. For now, if you've been stitching agent context together by pasting from half a dozen systems, this is the substrate that replaces the pasting.

The model is a commodity. The memory is the asset. Today there's an open-source version of the memory.

https://github.com/cgallic/open-brane

What are you stitching context from right now? Not the tools you love — the ones you keep copying out of because no agent can see inside them.