"Personal AI" is a marketing term. The AI you talk to every day isn't personal. It's a generic foundation model with a 200-token memory feature bolted onto the side and your first name tacked into the system prompt.
Claude forgets everything I told it last session. ChatGPT remembers what brand of coffee I drink and three other things I let it save. Gemini has no idea I exist between threads. None of them know what I shipped last week, what I tried that failed, who my clients are, or what I was researching on Tuesday.
That's not personal. That's cosplay.
I built what I think personal AI actually requires. Not a product. An architecture. A sovereign memory that every AI in my stack — Claude Code, Codex, Gemini CLI, a local Gemma model running on my home server, my production marketing agent on a VPS in Germany — queries before it speaks. Same memory. Different models. The AI becomes personal because the data layer is.
It has 156,926 events in it today. Here's what that actually looks like.
Personal AI Is a Data Layer Problem
The debate about which model is smartest has mostly resolved. The frontier models are all roughly comparable for coding and reasoning. Switching from one to the other is not a life-changing event.
The debate that hasn't happened: what does it mean for AI to know you?
The answer most products give is "memory features." ChatGPT lets you save facts. Claude has projects. Custom GPTs accept 8K of context. These are workarounds for a deeper problem. The real context for a person isn't 200 tokens of preferences. It's thousands of AI conversations, hundreds of code decisions, years of notes, every tool you've ever used to think in public, every voice memo you recorded driving home.
None of that is surfaced to the model. All of it is already on your disk.
The engineering problem is making that data queryable, searchable, and available to any model through a consistent interface. That's not a chatbot project. That's a data layer project. Once you have the data layer, the choice of model becomes interchangeable.
The Shape of a Sovereign Memory
I built a thing I call the brain. It lives on an Ubuntu box in my office with an RTX 3090. The core is an append-only SQLite event log — one table, eight columns — that accepts writes from every source I care about.
CREATE TABLE events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
ts TEXT NOT NULL,
source TEXT NOT NULL,
type TEXT NOT NULL,
actor TEXT,
payload_json TEXT NOT NULL,
attachment_uri TEXT,
ingested_at TEXT NOT NULL
);
No joins. No foreign keys. No migrations. Corrections are new events that reference old ones. I've never deleted a row.
Every piece of data enters through exactly one 80-line Python script: record_event.py. That's the only write path. 30+ ingestion scripts shell out to it as a subprocess. The LLM never generates SQL. Never touches the database. Never sees credentials.
The rule: deterministic scripts do the work. AI agents decide which scripts to run.
That rule is one of five architectural decision records committed to git as permanent documents:
2026-04-11-adopt-event-log-architecture.md
2026-04-11-adopt-deterministic-scripts-plus-agent-oversight.md
2026-04-11-adopt-qdrant-semantic-search-over-events.md
2026-04-11-scribe-voice-capture-architecture.md
2026-04-12-adopt-compiled-knowledge-layer.md
When an agent asks why the system works a certain way, it reads the ADR. The intent outlasts the code.
What Counts as "Me"
The source axis of the event log tracks where data came from. The full breakdown from the live database:
twitter 34,994 (full takeout archive — 12 years of likes and tweets)
google-search 34,758 (search history takeout)
gdrive 29,305 (941 Google Docs + 25k local files)
local-dev 12,772 (laptop dev files, notes, work-in-progress)
claude-laptop 9,647 (Claude Code sessions — 358 distinct)
youtube 7,263 (watch history)
web 6,031 (120 RSS feeds I follow)
fitbit 5,550 (sleep, heart rate, calories)
linkedin 4,047
kai 3,293 (my marketing AI agent's conversations)
code 3,038 (AST nodes — the code graph)
amazon 2,061 (orders, browsing)
git 1,543 (commits across 33 repos)
haro 608 (journalist queries I respond to)
openclaw 525 (WhatsApp/Discord agent messages)
scout 234 (local AI agent conversations)
Most of this is not "coding data." It's life-stream data. Fitbit readings, Amazon orders, a 12-year Twitter archive. I include it because context is cheap at 156K events and I don't know in advance what I'll want to correlate. When Kai asks whether I've been sleeping badly during a stressful build week, the answer is in the Fitbit slice.
The type axis is a different cut — 20+ distinct event types across the log:
query 35,258 (Google searches)
like 28,491 (Twitter likes)
document-chunk 26,838 (Drive doc fragments)
reply 8,844 (AI agent replies to me)
watch 8,378 (YouTube watches)
tweet 6,503 (my own tweets)
article 6,064 (RSS + extracted web content)
calories 4,958 (Fitbit)
node 3,024 (code graph AST)
commit 1,543
sleep-score 273
memory 24 (explicit remember-this entries)
155,348 distinct actors. 274MB of SQLite on disk. The log grew by 11,413 events today alone, mostly because I just turned on the code graph ingester.
Three Machines, One Log
The brain is sovereign — I own every byte, no vendor API sits in the critical path — but it spans three machines that I actually live on.
My Windows laptop runs most Claude Code sessions. A bash script reads ~/.claude/projects/ and syncs new JSONL files to the agent box over Tailscale SSH. The laptop-specific ingester then parses them. Same pattern for Drive extraction and local dev file ingestion.
The agent box (Ubuntu, RTX 3090, always-on) is the hub. Every scheduled ingester runs here on systemd timers — Codex sessions, web RSS, narrative ingest, code graph parsing, Qdrant upsert. This is where the database lives.
The hermes VPS in Germany runs my production AI agent, exposed over Discord as "Kai." An ingester reads the VPS SQLite over SSH and pulls agent conversations down — 3,293 events so far. Kai also queries the brain. When someone asks Kai what I shipped last week, the agent calls semantic_search over HTTP on port 7778 before answering.
Three machines. One log. No vendor lock-in. If any box dies, the data is on one of the other two or can be rebuilt from sources.
The Compiled Knowledge Layer
Raw events are the substrate. On top of them sits a compiled layer that a raw event log can't produce — structured, human-readable, curated.
The wiki is a markdown tree with 9 regions and 41 pages:
wiki/
├── agents/ (3) — kai, scout, openclaw-snapped
├── clients/ (12) — one page per active client
├── daily-briefs/ (5) — compiled end-of-day summaries
├── decisions/ (1) — ADR index
├── people/ (1)
├── products/ (8) — kaicalls, mdi, clawdflix, meetkai, ...
├── projects/ (3) — brain, cmo-agent-system, marketing-kb
├── systems/ (5)
└── topics/ (3)
Each page is human-editable. compile_wiki.py reconciles it against the event log and surfaces new entities that should probably exist.
The journal is daily markdown auto-compiled from events. compile_journal.py --date 2026-04-14 groups every event from that day by source and outputs a readable brief. A narrative subfolder holds longer threads that span multiple days.
Blobs sit outside the database. Voice recordings, images, PDFs — anything too large for a JSON payload — live in blobs/voice/ and similar, referenced by attachment_uri on the event row.
The brain is now a three-layer system: raw events, a curated wiki, and compiled narratives. Each layer is queryable independently. Each one gets embeddings.
Local Embeddings. No API Calls.
embed_events.py runs on cron every 5 minutes. It finds new events, builds a text summary from the payload, sends it to Ollama running nomic-embed-text locally on the RTX 3090, and upserts a 768-dimensional vector into a Qdrant collection.
Zero external API calls for embeddings. The vectors never leave my network. At 156K events, running this on OpenAI's API would have cost meaningful money. Running it locally costs GPU time I'm not using for anything else.
semantic_search.py queries Qdrant and joins full event payloads back from SQLite in one pass. The search works across everything:
- "Butterfly pipeline deployment" → top hits are commits on
cgallic/snappedai - "Scout tank diet" → Scout's Discord conversations and the CLI commands that edited its state files
- "Quantitative trading AI" → a video transcript I pasted to Kai, my follow-up research request, and both agents' replies, all in one query
The vector space clusters my life without me tagging anything. That's the payoff for having the data in one place.
How Any Model Becomes Personal
Everything up to this point is storage. The part that makes it personal AI is how models access it.
The brain exposes 18 tools through a Model Context Protocol (MCP) server:
record_event, query_events, semantic_search, get_journal,
compile_journal, list_decisions, get_decision, health_check,
append_narrative, get_wiki_page, list_wiki_pages, update_wiki_page,
compile_wiki, lint_wiki, resume_my_work, build_memory_packet,
get_journal_narrative, query_events
The server runs as stdio for Claude Code on the agent box. An mcp-proxy wrapper exposes the same tools as HTTP/SSE on port 7778 for remote agents. Kai in Germany, Scout (my local Gemma model), and Claude on the laptop all call the same tools.
When Claude Code on my laptop asks "what have I been working on with KaiCalls this week," it calls query_events filtered by repo = cgallic/kai_calls and since = 7d. When Scout helps me plan content, it calls semantic_search for the topic and gets real conversations, real commits, real notes. When Kai needs to answer a question about what I shipped, it calls resume_my_work and gets a briefing assembled from events and wiki pages.
The model changes. The memory doesn't. That's what makes it personal.
The Unexpected Side Effect
I built this for recall. It turned into a content engine.
Every Claude Code session is a story — problem, attempts, decision, resolution. The Dev.to article I published Monday, 13 of 14 Integrations Were Fake, was mined directly from a single session event. mine_stories.py runs nightly and flags sessions with high signal — lots of decisions, a concrete outcome, a surprising pivot. I review the output in the morning and pick what to write.
The week I started doing this, my content output doubled. I was already living the stories. I just wasn't capturing them.
The brain doesn't write the content. It surfaces stories I'd forget by Thursday.
What I'd Do Differently
Three mistakes worth naming.
Started with a flat event log, should have started with the wiki. Ingesting first and retrofitting the curated layer after was a week of wasted effort. Structure tells ingestion what to look for.
git_commit.sh auto-commits the brain with subjects like "snapshot 2026-04-12T01:01:29Z." Zero keywords, zero concepts. Those commits are semantically invisible. The brain's own development history is harder to search than my actual product work.
embed_events.py builds vectors exclusively from payload.summary. When narratives returned zero hits for obvious queries, I traced it to a too-aggressive summary length cap. Different content types need different summary budgets. I missed that until it broke.
Personal AI Is Already Possible. You Just Have to Build It.
Every piece of this — the event log, the ingestion scripts, the local embeddings, the MCP interface — is a weekend project. None of it requires ML research. None of it requires a cloud bill. The data is already on your disk.
The products being sold as "personal AI" are generic models with opt-in memory features. That's not what personal AI looks like. Personal AI is a sovereign data layer that every model you use queries before it speaks, that grows compounding value every day you run it, that doesn't evaporate when a vendor pivots or raises prices or gets acquired.
The model is a commodity. The memory is the asset.
Your AI isn't personal until you own the layer that makes it know you.
What's in your personal data layer right now? Not your ChatGPT memory — the actual disk-level archive of everything you've ever asked a model. I want to know who else is sitting on it unindexed.
Top comments (0)