MihaiBuilds

Posted on May 28 • Originally published at mihaibuilds.com

I built the memory, now I'm building the brain

#opensource #python #postgres #showdev

Originally published on mihaibuilds.com. Cross-posting here because dev.to is where I read a lot of work like this myself.

Three weeks ago I shipped Memory Vault v1.0 — an open-source, self-hosted AI memory layer you run yourself. Postgres + pgvector under the hood, hybrid search on top, an MCP server so Claude can read and write to it directly. The first product in a planned compounding stack.

Today the second product in that stack exists too. It's called The Brain.

I'll get to what it is in a second. First, the honest part: I didn't announce it the day I started. I built the first milestone in private, on my own, with no audience watching. Three days of focused work, ten merged PRs, then a clean stop. Build-in-public is the long-term plan for this project the same way it was for Memory Vault. But the first week was head-down, because the riskiest part of a new product isn't the announcement — it's whether the thing actually works. Now that it does, I can tell you about it without hedging.

What The Brain is

The Brain is a workflow orchestrator, not an AI agent. It runs Python-defined workflows you author, with full visibility into every step. The intelligence is in the workflow you write; The Brain is the runtime that makes it repeatable and observable. It calls LLMs as steps when needed; it doesn't replace them.

Concretely: you write a Python file that describes a sequence of steps. Each step is a shell command, a Memory Vault query, or a local LLM call. The Brain runs them top to bottom, passes output forward between them with named placeholders, and persists every run to Postgres. You inspect runs from the CLI. Successful runs exit 0; failed runs exit 1. It drops straight into cron jobs or CI pipelines.

That's the whole pitch. There's no autonomous decision-making, no agent loop, no self-direction. It runs what you tell it to run, and it records what happened.

Why this is a workflow orchestrator, not an agent

The orchestration layer is too load-bearing to depend on someone else's framework. When the framework changes, your workflows break — and these frameworks change constantly. LangChain, LangGraph, CrewAI, AutoGen: they're all moving targets, and "agent autonomy" is a moving definition. Owned runtime, owned database, owned LLM client, owned everything. Five years from now this still runs.

The other reason: build-in-public projects have an honesty constraint that pure-agent products don't. If The Brain claims to "decide" or "reason," I'd have to explain in every blog post what that means, what model it uses, and why the decision quality is what it is. Calling it a workflow orchestrator collapses that ambiguity. The user writes the logic. The Brain runs it. The output is reproducible. The behavior is auditable. The audience this is for — solo developers who use AI seriously and want their tools to be transparent — is allergic to the alternative.

What M1 ships, today

M1 is called "Bare Runner." The name is the honest scope: it's the smallest thing that proves The Brain works end-to-end.

Run Python-defined workflows from the CLI with brain run path/to/workflow.py. A workflow is a plain Python file exposing a module-level workflow = Workflow(...). Loaded with importlib and validated at load time via Pydantic.
Three step types: ShellStep (subprocess + timeout), MemoryVaultStep (Memory Vault REST), LLMStep (OpenAI-compatible HTTP against LM Studio). Each lives in its own executor class; the runner dispatches by step type with no isinstance chains.
Placeholder substitution — steps pass output forward with {step_name} tokens in any string field (prompt, command, query). Strict: a placeholder that names no prior completed step fails THAT step with a clear error. Fail fast; never pass literal braces downstream.
Persistent run history in Postgres — every run, every step, every output, every error. One workflow_runs table; the run's full step-by-step output is stored as a JSONB array (not an object — JSONB doesn't preserve key order, and execution order is part of the data).
CLI introspection — brain history lists past runs with --limit/--workflow/--status filters; brain show <run_id> shows full step-by-step detail for one run. Run IDs match by prefix (Memory Vault's token revoke precedent).
Strict failure semantics — a workflow halts on the first failed step; the run row always lands in Postgres with a terminal status, even if an executor raises unexpectedly. The runner catches every executor exception and persists. A run that started always ends with a terminal DB row; no exception escapes unpersisted.
One-command Docker — docker compose up -d brings up Postgres and The Brain together, migrations run on boot via a hand-rolled migration runner in src/db.py.
46 hermetic tests — pytest with a real Postgres test container, MV and LLM HTTP faked via httpx.MockTransport (built-in, no respx dependency). The suite is fast, deterministic, and runs anywhere with no external services.

A run looks like this:

$ docker compose exec brain brain run examples/hello.py
Running workflow 'hello' (2 steps)
  ✓ greeting
  ✓ echo_it_back
Run c609f5e0 — success

And inspecting it after:

$ docker compose exec brain brain show c609f5e0
Run:      c609f5e0-a8d6-4221-84c0-58c0b5d0460d
Workflow: hello
Status:   success
Started:  2026-05-22 19:54:58
Duration: 0.0s

Steps:
  ✓ greeting
      Hello from The Brain
  ✓ echo_it_back
      The previous step said: Hello from The Brain

Architectural decisions worth naming

Functional/declarative workflow files, not class + decorator. A workflow is a data structure: workflow = Workflow(name=..., steps=[Step(...), Step(...)]). Easiest to introspect, easiest to serialize, easiest to register for cron in the next milestone. Class-with-decorators looks ergonomic at first and gets in the way the moment you try to load workflows dynamically. The declarative form is what every workflow tool I respect converges on for a reason.

Single workflow_runs table for M1, per-step granularity deferred to M2. The whole run's step-by-step output goes in one JSONB column. Yes, a per-step table is the "right" long-term schema. But M2 is where state-between-runs lands, and that's the milestone where it actually pays for itself. Shipping the right table in M1 would be carrying schema complexity for a feature M1 doesn't have. Defer it; revisit when the use case lands.

Thin in-repo Memory Vault REST client (~30 LOC), no shared library. The Brain talks to Memory Vault over HTTP. I could extract a shared mihaibuilds-clients library now. I'd be over-engineering for a future I haven't reached. The right time to extract a client library is when there are three or more callers — not when there's one. Right now the entire client is httpx.post(...). When The Brain plus two or three addons all talk to Memory Vault, the duplication will tell me it's time to extract.

LM Studio only in v1.0, not LM Studio + Ollama. This is the explicit lesson I'm carrying from Memory Vault. Memory Vault's marketing claimed both LM Studio and Ollama support; only LM Studio was end-to-end tested. The Brain ships LM Studio only in v1.0. Ollama probably works through the same OpenAI-compatible client shape, but "probably works" isn't a release guarantee. Only claim providers you've actually tested. This rule survives every product I build.

Owned runtime, not LangChain/LangGraph/CrewAI wrapper. Already covered above — but worth re-stating in the architecture section because it's the decision the rest of the codebase shape derives from. The Brain is ~1,500 lines of Python. A LangChain wrapper would be more code, more dependencies, and a runtime that breaks every time the upstream framework changes its API. Owned runtime is the simpler answer, not the more ambitious one.

What v1.0 won't do, on purpose

No autonomous decision-making. The Brain runs the workflow you defined. It doesn't pick a different step at runtime. If you want branching, you write a workflow that branches. Rich conditional logic is in the v1.0-out section deliberately.

No multi-user / team workflows. Single-tenant by design. Multi-user activation lives behind a PRO tier later.

No managed cloud. Self-hosted, MIT-licensed, runs on your laptop or your VPS. Always.

No visual workflow builder. The workflow file is the source of truth. You read it like Python, you diff it like Python, you grep it like Python. Visual builders are a PRO concern, not a v1.0 concern.

These are deliberate trade-offs. The Brain v1.0 is the smallest correct version, not the most ambitious one.

Who this is for

Developers who run real workflows on their own machines and want LLMs as a step inside those workflows — not as the thing in charge. Solo builders stitching together memory, models, and shell tools who are tired of agent frameworks that change their API every quarter. Anyone who wants every run to be inspectable, every output persisted, and every decision their own to make.

If you've ever written a Python script that calls an LLM, then bolted on a cron entry, then realized you have no record of what it did yesterday — this is for you.

What's next

Milestone 2 is triggers and state — cron schedules, a long-running scheduler daemon, and workflows that read the previous run's output. M2 is the milestone where The Brain becomes worth running unattended.

The full roadmap and milestone progress table live in the repo's README. Each milestone gets a dev-log post here as it ships — one of four dev.to posts across the build period.

Try it

git clone https://github.com/MihaiBuilds/the-brain
cd the-brain
docker compose up -d
docker compose exec brain brain run examples/hello.py

The repo has the full quickstart with configuration, Memory Vault wiring, and the real-world digest example (recent memories → local LLM summary → markdown file, all in one Python file).

Follow along

Twitter / X: @mihaibuilds
Blog: mihaibuilds.com
GitHub: github.com/MihaiBuilds/the-brain

Top comments (3)

Harjot Singh • May 31

The memory-then-brain sequencing is the right order, and it's worth being precise about what "the brain" means so it doesn't become a vague goal. Memory is storage + retrieval; the "brain" is the policy layer on top - when to recall vs ignore, how to decide the next action, when to stop, and crucially when to NOT act because it's unsure. That decision/control layer is where agents actually get hard, because a perfect memory feeding a reckless decision policy is still a reckless agent. The interesting work isn't more recall, it's better judgment about what to do with what you recalled.

The piece I'd bake into the brain from day one: a verify/abstain step, so the decision layer can say "I don't have enough to act" instead of confidently doing the wrong thing. That's the spine of how I build Moonshift, the thing I work on - a multi-agent pipeline that takes a prompt to a deployed SaaS, where the "brain" (orchestration + decisions) is gated by a verify layer rather than trusted to always be right. Memory + judgment + a gate is the trio that makes it safe. Multi-model routing keeps a build ~$3 flat, first run free no card. Love the framing and the build-in-public arc. What's the brain's first real decision - routing/next-action selection, or knowing when to stop? Knowing-when-to-stop is the one I'd build earliest; runaway agents are the expensive failure.

MihaiBuilds • May 31

Appreciate this — and the precision is worth pushing on, because The Brain isn't actually an agent in the policy-layer sense you're describing.

A workflow on The Brain is a python file with steps in a declared order. The decisions (when to recall, when to stop, what to do with output) are author-written if statements in that file, not an LLM choosing the next action. Closer to Airflow or Prefect than to an agent runtime. The LLM is a tool you call from a step, not the thing driving control flow.

That said — verify/abstain absolutely applies inside a step. An LLMStep that confidently returns garbage is still the expensive failure mode you're describing, just one layer down. M2's adding state passing between steps (which is what lets a workflow read a previous step's output and branch on it), so the natural place for a "did this output meet the bar to continue" gate is in that path. Banking the idea.

Moonshift's gating-the-orchestration-layer angle is the harder problem of the two — curious how you handle the abstain signal. Is it model-confidence-based, a second model judging the first, or rule-based on the output shape?

Harjot Singh • May 31

Makes sense, if control flow is author-written if-statements then it's orchestration-as-code (Prefect-like) and the verify gate just lives inside the LLMStep. State-passing between steps is exactly the unlock for a "did this clear the bar to continue" gate.

On the abstain signal in Moonshift: it's deliberately not model-confidence. Self-reported confidence is basically noise, models are confidently wrong. It's mostly rule-based on output shape plus an independent check: does it parse/compile, does it satisfy the schema/contract the step promised, do the tests or a smoke run pass. Where shape-checking isn't enough (judgment calls), a second model judges the first against the spec, never the same model grading its own work. So: cheap deterministic checks first, LLM-judge only as the fallback. The deterministic layer catches most of it for a fraction of the cost. Banking your state-passing framing too, that's the clean place to hang it.