DEV Community: MihaiBuilds

I built the memory, now I'm building the brain

MihaiBuilds — Thu, 28 May 2026 08:10:21 +0000

Originally published on mihaibuilds.com. Cross-posting here because dev.to is where I read a lot of work like this myself.

Three weeks ago I shipped Memory Vault v1.0 — an open-source, self-hosted AI memory layer you run yourself. Postgres + pgvector under the hood, hybrid search on top, an MCP server so Claude can read and write to it directly. The first product in a planned compounding stack.

Today the second product in that stack exists too. It's called The Brain.

I'll get to what it is in a second. First, the honest part: I didn't announce it the day I started. I built the first milestone in private, on my own, with no audience watching. Three days of focused work, ten merged PRs, then a clean stop. Build-in-public is the long-term plan for this project the same way it was for Memory Vault. But the first week was head-down, because the riskiest part of a new product isn't the announcement — it's whether the thing actually works. Now that it does, I can tell you about it without hedging.

What The Brain is

The Brain is a workflow orchestrator, not an AI agent. It runs Python-defined workflows you author, with full visibility into every step. The intelligence is in the workflow you write; The Brain is the runtime that makes it repeatable and observable. It calls LLMs as steps when needed; it doesn't replace them.

Concretely: you write a Python file that describes a sequence of steps. Each step is a shell command, a Memory Vault query, or a local LLM call. The Brain runs them top to bottom, passes output forward between them with named placeholders, and persists every run to Postgres. You inspect runs from the CLI. Successful runs exit 0; failed runs exit 1. It drops straight into cron jobs or CI pipelines.

That's the whole pitch. There's no autonomous decision-making, no agent loop, no self-direction. It runs what you tell it to run, and it records what happened.

Why this is a workflow orchestrator, not an agent

The orchestration layer is too load-bearing to depend on someone else's framework. When the framework changes, your workflows break — and these frameworks change constantly. LangChain, LangGraph, CrewAI, AutoGen: they're all moving targets, and "agent autonomy" is a moving definition. Owned runtime, owned database, owned LLM client, owned everything. Five years from now this still runs.

The other reason: build-in-public projects have an honesty constraint that pure-agent products don't. If The Brain claims to "decide" or "reason," I'd have to explain in every blog post what that means, what model it uses, and why the decision quality is what it is. Calling it a workflow orchestrator collapses that ambiguity. The user writes the logic. The Brain runs it. The output is reproducible. The behavior is auditable. The audience this is for — solo developers who use AI seriously and want their tools to be transparent — is allergic to the alternative.

What M1 ships, today

M1 is called "Bare Runner." The name is the honest scope: it's the smallest thing that proves The Brain works end-to-end.

Run Python-defined workflows from the CLI with brain run path/to/workflow.py. A workflow is a plain Python file exposing a module-level workflow = Workflow(...). Loaded with importlib and validated at load time via Pydantic.
Three step types: ShellStep (subprocess + timeout), MemoryVaultStep (Memory Vault REST), LLMStep (OpenAI-compatible HTTP against LM Studio). Each lives in its own executor class; the runner dispatches by step type with no isinstance chains.
Placeholder substitution — steps pass output forward with {step_name} tokens in any string field (prompt, command, query). Strict: a placeholder that names no prior completed step fails THAT step with a clear error. Fail fast; never pass literal braces downstream.
Persistent run history in Postgres — every run, every step, every output, every error. One workflow_runs table; the run's full step-by-step output is stored as a JSONB array (not an object — JSONB doesn't preserve key order, and execution order is part of the data).
CLI introspection — brain history lists past runs with --limit/--workflow/--status filters; brain show <run_id> shows full step-by-step detail for one run. Run IDs match by prefix (Memory Vault's token revoke precedent).
Strict failure semantics — a workflow halts on the first failed step; the run row always lands in Postgres with a terminal status, even if an executor raises unexpectedly. The runner catches every executor exception and persists. A run that started always ends with a terminal DB row; no exception escapes unpersisted.
One-command Docker — docker compose up -d brings up Postgres and The Brain together, migrations run on boot via a hand-rolled migration runner in src/db.py.
46 hermetic tests — pytest with a real Postgres test container, MV and LLM HTTP faked via httpx.MockTransport (built-in, no respx dependency). The suite is fast, deterministic, and runs anywhere with no external services.

A run looks like this:

$ docker compose exec brain brain run examples/hello.py
Running workflow 'hello' (2 steps)
  ✓ greeting
  ✓ echo_it_back
Run c609f5e0 — success

And inspecting it after:

$ docker compose exec brain brain show c609f5e0
Run:      c609f5e0-a8d6-4221-84c0-58c0b5d0460d
Workflow: hello
Status:   success
Started:  2026-05-22 19:54:58
Duration: 0.0s

Steps:
  ✓ greeting
      Hello from The Brain
  ✓ echo_it_back
      The previous step said: Hello from The Brain

Architectural decisions worth naming

Functional/declarative workflow files, not class + decorator. A workflow is a data structure: workflow = Workflow(name=..., steps=[Step(...), Step(...)]). Easiest to introspect, easiest to serialize, easiest to register for cron in the next milestone. Class-with-decorators looks ergonomic at first and gets in the way the moment you try to load workflows dynamically. The declarative form is what every workflow tool I respect converges on for a reason.

Single workflow_runs table for M1, per-step granularity deferred to M2. The whole run's step-by-step output goes in one JSONB column. Yes, a per-step table is the "right" long-term schema. But M2 is where state-between-runs lands, and that's the milestone where it actually pays for itself. Shipping the right table in M1 would be carrying schema complexity for a feature M1 doesn't have. Defer it; revisit when the use case lands.

Thin in-repo Memory Vault REST client (~30 LOC), no shared library. The Brain talks to Memory Vault over HTTP. I could extract a shared mihaibuilds-clients library now. I'd be over-engineering for a future I haven't reached. The right time to extract a client library is when there are three or more callers — not when there's one. Right now the entire client is httpx.post(...). When The Brain plus two or three addons all talk to Memory Vault, the duplication will tell me it's time to extract.

LM Studio only in v1.0, not LM Studio + Ollama. This is the explicit lesson I'm carrying from Memory Vault. Memory Vault's marketing claimed both LM Studio and Ollama support; only LM Studio was end-to-end tested. The Brain ships LM Studio only in v1.0. Ollama probably works through the same OpenAI-compatible client shape, but "probably works" isn't a release guarantee. Only claim providers you've actually tested. This rule survives every product I build.

Owned runtime, not LangChain/LangGraph/CrewAI wrapper. Already covered above — but worth re-stating in the architecture section because it's the decision the rest of the codebase shape derives from. The Brain is ~1,500 lines of Python. A LangChain wrapper would be more code, more dependencies, and a runtime that breaks every time the upstream framework changes its API. Owned runtime is the simpler answer, not the more ambitious one.

What v1.0 won't do, on purpose

No autonomous decision-making. The Brain runs the workflow you defined. It doesn't pick a different step at runtime. If you want branching, you write a workflow that branches. Rich conditional logic is in the v1.0-out section deliberately.

No multi-user / team workflows. Single-tenant by design. Multi-user activation lives behind a PRO tier later.

No managed cloud. Self-hosted, MIT-licensed, runs on your laptop or your VPS. Always.

No visual workflow builder. The workflow file is the source of truth. You read it like Python, you diff it like Python, you grep it like Python. Visual builders are a PRO concern, not a v1.0 concern.

These are deliberate trade-offs. The Brain v1.0 is the smallest correct version, not the most ambitious one.

Who this is for

Developers who run real workflows on their own machines and want LLMs as a step inside those workflows — not as the thing in charge. Solo builders stitching together memory, models, and shell tools who are tired of agent frameworks that change their API every quarter. Anyone who wants every run to be inspectable, every output persisted, and every decision their own to make.

If you've ever written a Python script that calls an LLM, then bolted on a cron entry, then realized you have no record of what it did yesterday — this is for you.

What's next

Milestone 2 is triggers and state — cron schedules, a long-running scheduler daemon, and workflows that read the previous run's output. M2 is the milestone where The Brain becomes worth running unattended.

The full roadmap and milestone progress table live in the repo's README. Each milestone gets a dev-log post here as it ships — one of four dev.to posts across the build period.

Try it

git clone https://github.com/MihaiBuilds/the-brain
cd the-brain
docker compose up -d
docker compose exec brain brain run examples/hello.py

The repo has the full quickstart with configuration, Memory Vault wiring, and the real-world digest example (recent memories → local LLM summary → markdown file, all in one Python file).

Follow along

Twitter / X: @mihaibuilds
Blog: mihaibuilds.com
GitHub: github.com/MihaiBuilds/the-brain

Memory Vault v1.0 — building open-source AI memory the boring way

MihaiBuilds — Sat, 09 May 2026 13:15:56 +0000

Originally published on mihaibuilds.com. Cross-posting here because dev.to is where I find a lot of this kind of work myself.

For the past year I kept hitting the same wall. I'd have a real conversation with Claude — work through a database design, debug something gnarly, agree on a convention I wanted to keep — and the next morning it was gone. Not summarized. Not searchable. Just gone. ChatGPT was the same. Every assistant I used had the long-term memory of a goldfish, and the workaround the industry settled on was "paste the relevant context back in every time." That's not memory. That's me being the memory.

So I built one. Memory Vault is an open-source, self-hosted AI memory system you run yourself: Postgres with pgvector underneath, hybrid search on top, an MCP server so Claude can read and write to it directly, a knowledge graph that extracts entities without an LLM bill, a local LLM chat with retrieved-source citations, and a one-command Docker setup. Two days ago it crossed the line from "build-in-public project" to "v1.0 stable release." (v1.0.2 yesterday closed two security findings I caught after enabling branch protection — path-traversal + info-exposure on an internal stream handler.)

What Memory Vault is

A long-term memory layer for AI assistants and the apps you build on top of them. You ingest text — markdown notes, conversation logs, anything plain — and it gets chunked, embedded, full-text indexed, and stored in a single Postgres database. Hybrid search (vector similarity + keyword tsvector + Reciprocal Rank Fusion) returns the right chunks back when you query. An MCP server exposes four tools (recall, remember, forget, status) that Claude Desktop or Claude Code can call directly, which means Claude can read and write to your memory inside any conversation without you copy-pasting context. A REST API exposes the same operations for any app you build. A dashboard gives you a Search, Browse, Graph, Ingest, Stats, and Chat page. A local LLM chat (LM Studio in v1.0) lets you talk to your memories with full source citations — every response shows which chunks it pulled from, clickable.

It runs entirely on your machine. No API keys. No cloud. No telemetry. Postgres on port 5432, the API on port 8000, dashboard on the same port. docker compose up and it's running.

What v1.0 actually does

Hybrid search — pgvector HNSW for semantic + tsvector GIN for keyword + Reciprocal Rank Fusion to merge them. Vector-only search misses exact terms; keyword-only misses paraphrases. RRF gets both.
MCP server — four tools (recall, remember, forget, status) callable from Claude Desktop, Claude Code, or any MCP client. Claude reads and writes your memory in-conversation.
Knowledge graph — spaCy NER plus co-occurrence extracts entities (Person, Project, Tool, Concept) and related_to relationships from every ingested chunk. No LLM, no per-token cost, rendered as an interactive Cytoscape force-directed graph.
Memory spaces — namespacing for different contexts (work, personal, projects). Per-space dedup; cross-space isolation by default.
Local LLM chat — LM Studio native API with sources panel showing retrieved chunks for every answer. Every response is grounded and the grounding is visible.
REST API — bearer-auth-protected, OpenAPI-documented at /docs, every operation the dashboard does is also a documented endpoint.
One-command Docker — docker compose up. Postgres, the app, and the spaCy model bundled into a single image at build time, no first-run download.
Self-hosted, MIT-licensed — your data stays on your machine. The whole thing is yours.
170 tests passing — pytest with a real Postgres + pgvector service container, no mocks of the database.

Architectural decisions worth naming

Postgres + pgvector instead of a dedicated vector database. I run one database, not two. Operationally this matters more than the marginal performance of a purpose-built vector store at small scale. You already know how to back up Postgres. You already know how to monitor it. HNSW indexes plus tuned maintenance_work_mem and ef_search get you to "fast enough for hundreds of thousands of chunks on a laptop." When that stops being true, the migration path is sane. Until then, one database is the right answer for a self-hosted personal-memory tool.

Hybrid search instead of vector-only. Pure vector search is great at paraphrase and concept. It's bad at exact terms — model names, error codes, file paths, anything where the literal string is the signal. Memory Vault stores both an embedding and a tsvector for every chunk and merges the two ranked result sets with Reciprocal Rank Fusion. RRF is parameter-free, doesn't require score normalization, and consistently beats either approach alone on the kind of mixed queries real users actually type.

spaCy + co-occurrence for the knowledge graph, not an LLM. The default move in this space is to feed every chunk through an LLM and ask it for entities and relationships. It works. It also costs money on every ingest, couples your graph quality to whichever model you happened to pick, and requires API keys for a tool whose entire pitch is no API keys. spaCy's en_core_web_sm model plus a co-occurrence rule (two entities in the same chunk = a related_to edge, weighted by frequency) gets you a useful graph for zero per-ingest cost. The honest limits — English only, context-dependent NER, no fuzzy matching — are documented up front rather than masked.

MCP-first, not REST-first. Memory Vault was designed around the assumption that the primary user of this database is going to be Claude, not me. The MCP server isn't a wrapper around a REST API — it's a direct path into the same code that the REST API uses. Both are first-class. But the design starting point was "what does Claude need to call to make memory feel native," and then the REST API was the same operations exposed for human-driven apps. That ordering changes which tradeoffs are interesting.

The PoolClosed story

About a week before tag day, I added a CLI command called memory-vault diagnose. It bundles app logs, database logs, status output, OS info, and redacted environment into a zip file users can attach to bug reports. Foundation work. Paid for once. The kind of thing that makes every future bug report ten times higher signal-to-noise.

I shipped it. Then I ran the test suite. 163 passed, 52 errored. Every error was psycopg_pool.PoolClosed.

First instinct: probably an httpx lifespan thing. Modern httpx has changed how it handles ASGI lifespan events between minor versions. The test suite uses httpx.ASGITransport to drive the FastAPI app in-process, sharing a session-wide connection pool fixture. If the transport was firing shutdown events between tests, the pool would close mid-suite. There's a kwarg for this. I added lifespan="off" to the transport. TypeError: ASGITransport.__init__() got an unexpected keyword argument 'lifespan'. The kwarg doesn't exist in 0.28.x. Reverted.

Second instinct: walk the call graph. memory-vault diagnose calls into the CLI's _run_status helper to capture status output for the bundle. _run_status was implemented as asyncio.run(_cmd_status()) — directly calling the CLI's status function in-process. _cmd_status initializes a connection pool at the top of the function and closes it via a finally block at the end. Which is correct behavior for the CLI. It's also exactly what you don't want when something else in the same process — like a session-wide test fixture — already owns a pool that's mid-flight.

The fix was four lines. Replace the in-process asyncio.run with subprocess.run(["memory-vault", "status"]). The subprocess gets its own pool, lives its own lifecycle, exits cleanly, and the parent process's pool is never touched. 163 passed, 0 errored.

The lesson isn't about pools or fixtures specifically. It's that "obvious" fixes (changing the test transport config) and root causes (one function quietly tearing down state owned by a different function) live in different parts of the code. The lifespan="off" move would have masked the symptom in the tests and left the actual bug in the CLI, where users would have hit it. Almost the entire week's gap between "all my sub-steps look done" and "v1.0 is actually shippable" was the discipline of not bypassing this kind of thing when bypassing was easy.

What v1.0 doesn't do, on purpose

English-only NER. The bundled spaCy model is en_core_web_sm. Non-English content gets little to no useful entity extraction. Multilingual models exist; they're heavier and slower; they're a question driven by real user demand, not a v1.0 must-have.

No fuzzy entity matching. "PostgreSQL" and "Postgres" are separate entities in the graph. No alias merging in v1.0.

No re-extraction on edit. If you re-ingest a corrected version of a chunk, the new entities are added but the old ones aren't cleaned up.

Single-user. v1.0 has bearer auth and one user behind it. The schema has owner_id and access_level columns from day one, but multi-user activation is part of the PRO tier.

LM Studio only for chat. Ollama and llama.cpp use the same OpenAI-compatible client architecture under the hood, but the only end-to-end-tested path in v1.0 is LM Studio. Ollama support is not in v1.0.

No multi-conversation history in chat. Single-thread chat. Driven by whether real users ask for it.

These are deliberate trade-offs. Honest gaps documented up front build more trust than feature bullets that fall apart when someone actually tries them.

The open-core model

Memory Vault is and will always be MIT-licensed. The whole thing — search, MCP, graph, REST API, dashboard, local LLM chat, ingestion pipeline, the database schema, the Docker setup. You can run it on your machine. You can fork it. You can use it inside a commercial product. The free tier is genuinely useful — not a crippled demo of the paid tier.

A paid PRO tier is on the roadmap for teams: dedup with importance decay, conflict resolution and supersede chains, multi-user activation, additional adapters (PDF, web pages), automated encrypted backups, and a fuller dashboard with analytics. The PRO tier is genuinely paid features — operational tools that solo users on a laptop don't strictly need, and teams running shared knowledge bases really do. The split is honest by design.

What this took to build

Seven weeks of evenings and weekends across nine locked milestones, scope frozen on March 27. M1 was the announcement. M2 the core hybrid search. M3 the one-command Docker. M4 the MCP server. M5 the REST API. M6 the dashboard. M7 the knowledge graph. M8 — this one — was local LLM chat plus the polish, CI/CD, security review, and release engineering that turn a build-in-public project into something other people can actually use.

Two of those weeks were the kind of work nobody sees: structured JSON logging with request ID propagation, a diagnostic CLI that produces a redacted bundle for bug reports, GitHub Actions for lint and test and multi-arch Docker release, security audit (bandit, npm audit, Dependabot, CodeQL, plus a 15-test pentest pass with curl), Contributor Covenant Code of Conduct, threat model in SECURITY.md, branch protection rules, and the discipline to fix the actual root cause of a test failure instead of bypassing it. Unglamorous. Also the difference between v0.7 and v1.0.

What's next

Beyond. Memory Vault is the first product in a planned compounding stack — The Brain is the next layer, building agents on top of this memory infrastructure. The memory layer is the one that has to be solid first. Today it is.

Try it

git clone https://github.com/MihaiBuilds/memory-vault
cd memory-vault
cp .env.example .env
docker compose up -d

Open http://localhost:8000 and you're running.

GitHub — latest release
README and quick start
MCP setup for Claude Desktop / Claude Code
Questions and bug reports: GitHub Issues
General discussion: GitHub Discussions

Credits

Three Postgres tuning tips landed during M6 and M7 that materially improved Memory Vault: @rivestack on maintenance_work_mem, ef_search as a runtime knob, and post-deploy cache warmup for HNSW indexes. The first ships in v1.0; we'll use the others when we get to them. Public credit, fair credit. Build-in-public works because builders with deeper expertise see what you're shipping and tell you what's wrong before production does.

Beta tester Inevitable-Way-3916 ran the dashboard early, asked the architecture questions that forced the ARCHITECTURE.md doc to exist, and put bulk ingest on the list. Thanks.

Follow along

Twitter / X: @mihaibuilds
Blog: mihaibuilds.com
GitHub: github.com/MihaiBuilds/memory-vault