Originally published on mihaibuilds.com. Cross-posting here because dev.to is where I read a lot of work like this myself.
A few days ago I shipped the third milestone of The Brain — webhook triggers with HMAC auth, file watchers in their own container, the {trigger.X} placeholder family for inbound payloads. That was M3. The Brain had the four classical trigger types: manual, scheduled, webhook, file.
Today M4 is done. The Brain now talks to other tools — natively, over MCP — and the LLM step picks its own model per call.
Why this matters
M1 was the runner. M2 made the runner work unattended. M3 made the runner reactive. M4 makes the runner ecosystem-aware.
Before M4, The Brain was a workflow orchestrator that knew how to do three things on its own: run shell commands, call a local LLM through a fixed configured endpoint, and call Memory Vault over its REST API. Useful, but every integration with anything new required writing a custom adapter.
After M4, The Brain can call any MCP server as a workflow step. Memory Vault's MCP server, GitHub's, Sentry's, your own. The stdio transport is the v1.0 commitment; the workflow file says "spawn this MCP server, call this tool, here are the arguments" and The Brain handles the lifecycle.
The LLM step also got per-step overrides. Before M4, every workflow used one configured model server at one URL. Now each step can name its own provider URL, its own model, its own API key, its own timeout, its own max tokens. Mix a fast local model and a slow careful one in the same workflow.
What M4 ships
Per-step LLM overrides. Each LLMStep can override the global LLM_BASE_URL / LLM_API_KEY / LLM_MODEL env vars per call:
LLMStep(
name="fast_summary",
prompt="Two sentences: {previous.recall}",
model="mistralai/ministral-3-3b",
timeout_seconds=60,
max_tokens=400,
)
LLMStep(
name="careful_analysis",
prompt="Detailed breakdown of: {fast_summary}",
provider_url="http://other-host:1234/v1",
api_key="sk-...",
model="anthropic/claude-3-5-sonnet",
timeout_seconds=600,
max_tokens=4000,
)
Each field falls back to the corresponding env var when set to None. Tested against LM Studio only — other OpenAI-compatible providers (Ollama, vLLM, llama.cpp server, OpenAI proper) may work via the same wire format but are not promised in v1.0.
MCP tool calling as a step type. A new McpToolStep peer to the existing step types:
McpToolStep(
name="recall",
server_command="python -m memory_vault.mcp",
tool="recall",
args={"query": "{previous.search_term}", "limit": 10},
timeout_seconds=30,
)
The server_command and string values in args accept {previous.X} and {trigger.X} placeholders the same way ShellStep.command does. The tool name and args keys are never substituted — protocol-level identifiers, not user data. Non-string args values (ints, bools, nested dicts) pass through unchanged.
stdio transport only in v1.0. initialize + tools/call only — no tools/list, no resources, no prompts, no server-initiated notifications. Each step spawns the MCP server fresh, runs the handshake, calls one tool, and tears the subprocess down. No shared state. No pooling.
The derive-your-own-image pattern. The stock mihaibuilds/the-brain image bundles zero MCP servers. The Brain is a workflow orchestrator; MCP servers are independent products. Coupling them would force users into installing things they don't need.
If your workflow calls an MCP server, install that server in a derived image:
FROM mihaibuilds/the-brain:latest
RUN <install-command-per-the-mcp-server-s-readme>
examples/brain-with-mv-mcp/ ships a complete worked composition with Memory Vault — Dockerfile, docker-compose.yml, a verify workflow, and a runbook README.
Architectural decisions worth naming
Per-step spawn lifecycle. Every McpToolStep spawns its MCP server subprocess at step start, runs the MCP initialize handshake, calls one tools/call, and kills the subprocess at step end. No shared client. No connection pool. Cold start cost per step is ~200-500ms for a server like MV's that loads sentence-transformers + spaCy + a pgvector connection on every spawn. The trade-off: isolation per call. A crashed MCP server kills only one step. A leaked file descriptor in the MCP server is cleaned up by the OS when we kill it. The next step gets a fresh subprocess. Per-run pooling is a future consideration if real latency complaints surface; v1.0 takes the isolation.
stdio transport, newline-delimited JSON, no Content-Length framing. The MCP spec defines stdio framing as newline-delimited JSON — one JSON message per line, terminated by \n on both stdin and stdout. The Content-Length framing is the streamable-HTTP transport, which is a separate protocol surface with its own auth concerns (Bearer / mTLS / OAuth). For v1.0, stdio is the deeper and more universal transport — Memory Vault's MCP server uses it, Claude Desktop uses it, and every reference MCP implementation uses it. HTTP transport may come in a future version.
Single-flight via asyncio.Lock. A single StdioMcpClient instance serializes call_tool invocations internally. The per-step-spawn lifecycle means concurrent calls per client never happen in normal use, but the lock removes a real foot-gun if someone hand-shares a client. Cheap insurance.
Eager handshake on connect. The MCP initialize handshake runs in __aenter__ / connect, not lazily on first call_tool. The per-call timeout covers handshake + tool call together from the caller's POV. If initialize hasn't run yet when call_tool fires, the caller's 30-second budget would silently include some unknown amount of handshake time. Eager handshake makes the budget actually mean what it says.
Background stderr reader for pipe-fill resilience. A continuous background task drains the subprocess's stderr pipe to a rolling ~1 KB tail. Without it, a chatty MCP server writing lots of stderr (say, a debug-build that logs everything) would fill the OS pipe buffer (~64 KB on macOS) and the subprocess would block waiting for someone to read stderr. Meanwhile The Brain would be waiting for stdout, deadlocking the whole call. The background reader prevents that. The captured tail is exposed via the stderr_tail property for debug logging at step boundary — and never returned in StepResult.output. Workflow data and debug data are different surfaces. A workflow author querying {previous.recall} must never see stderr noise mixed into their workflow values.
Substitution boundaries are sharp. The runner's _resolve_step function gains a new branch for McpToolStep.args (a dict). It iterates dict values, substitutes string-typed values via {previous.X} + {trigger.X} resolvers, leaves non-strings and keys untouched. The tool name is never substituted. Nested-dict args (args={"filter": {"query": "{previous.X}"}}) are not recursively substituted — consistent with the {trigger.body.foo} no-nesting rule from M3. Pinned by five separate substitution-boundary tests plus cross-PR pins in the audit-pass test file.
isError: true becomes step failure. When an MCP server returns a successful JSON-RPC response containing isError: true, The Brain treats it as step failure — same shape as a non-zero shell exit code. The first text content block in the response becomes the step's error message. MCP-side tool errors flow through the same workflow-halt semantics as every other failure path, so workflow authors don't have to check isError in every downstream step.
MemoryVaultStep ↔ McpToolStep coexistence. Both ship in v1.0. Neither is deprecated. MemoryVaultStep calls MV over its REST API with no extra setup — easy default for "I just want hybrid search from MV." McpToolStep is the generic any-MCP-server mechanism — works for MV's MCP server (via the derive-pattern), GitHub's, Sentry's, your own. The deprecation question was considered and rejected — forcing users into the harder setup path right at v1.0 is the wrong direction.
The moment for the ecosystem
I want to call this out separately because it matters more than either feature individually.
Memory Vault went live two months ago. The Brain has been under construction since May. I've been calling them "the ecosystem" the whole time, but they were two completely separate projects living in two completely separate repositories. They had never actually worked together end-to-end.
For M4's verify pass, I built a derived image with both projects installed, separate Postgres instances (Brain's tables + MV's pgvector tables), three containers in one Docker network. The verify workflow asks Memory Vault — over MCP — for memories matching a query. Memory Vault searches its pgvector index and returns chunks with similarity scores. The Brain pipes the chunks into a local LLM step. The LLM writes a digest. A shell step saves it.
It worked. Real database, real hybrid search, real LLM call, real file written.
I ran it twice. Once with Ministral-3B-Instruct loaded in LM Studio — about 4 seconds end-to-end. Once with Qwen3.5-9B, a reasoning-style model — about 2 minutes 13 seconds. Same workflow file. The only difference was three fields on the LLM step: model, timeout_seconds, max_tokens.
Both summaries were real. The fast model wrote a tight two-sentence digest. The reasoning model produced a longer, more comprehensive summary that captured more of the original context — at thirty times the wall-clock cost. Same per-step override mechanism made the swap trivial.
This is the first time The Brain and Memory Vault have actually composed in production shape. The moment where "the ecosystem" stops being a roadmap word and starts being a system that exists.
What v1.0 won't do, on purpose
The LLM step does not drive tool calling. LLMStep is chat-completion only — it produces text. If a workflow wants "LLM picks an MCP tool to call," it wires that explicitly: LLMStep produces a tool name, {previous.X} substitution puts that name into the next step's args (the tool field itself is locked NOT-substituted, so the workflow author chains through args or uses separate branches). The workflow file is the orchestrator. The LLM transforms text. It does not decide. This is by design.
No tools/list discovery. Workflow authors know the tool name and the args shape in advance, the same way they know what shell commands they're calling. If you want introspection, build it in a separate workflow step.
MCP HTTP transport is not in v1.0. Stdio only. HTTP transport (the streamable-HTTP MCP variant) brings its own auth surface. For v1.0, stdio is the deeper transport.
The stock image bundles zero MCP servers. Per the ecosystem rule. Derive-your-own-image is the documented path.
Per-run MCP server pooling is not implemented. Per-step spawn is the v1.0 lifecycle. Two McpToolStep calls to the same server in one workflow run produce two distinct subprocess PIDs. The cold-start cost is real; v1.0 takes the isolation guarantee.
No custom LLM auth schemes. Bearer-only when an api_key is set, no header when it isn't. If your provider needs something else, bake it into your derived image.
No bundled MCP servers. Stock image stays lean. Each MCP server is a separate install in your derived Dockerfile.
No Docker-socket-mount for The Brain container. Considered and rejected. A leaked webhook secret + a malicious payload substituted into server_command would become a host escape. The derive-your-own-image pattern is the secure alternative — you control the contents of your derived image, not a runtime Docker socket.
Reasoning models need bigger budgets. Reasoning-style LLMs (qwen 3.x+, o1-style, R1-style, QwQ) consume token budget on internal reasoning before producing visible content. If you point a per-step LLM call at a reasoning model with default budgets, you may get empty visible output. The fix is bigger budgets — timeout_seconds=600 and max_tokens=8000+ is a reasonable starting point for a 9B reasoning model. Instruct models (Ministral, Mistral Instruct, Llama Instruct) don't have this behavior.
These are deliberate trade-offs. M4 is the smallest correct ecosystem-aware surface, not the most ambitious one.
Who this is for
Same audience as M1 + M2 + M3, with one addition: anyone building self-hosted workflow automation that needs to reach multiple specialized tools without writing a custom adapter for each one. The MCP ecosystem in 2026 has dozens of servers — for memory (Memory Vault), for code review (GitHub MCP), for observability (Sentry MCP), for filesystems, for databases, for browser control. M4 makes any of them callable from a Brain workflow step with the same shape.
If you've ever wanted to wire an LLM workflow into multiple specialized backends without committing to LangChain — this is for you.
What's next
Milestone 5 is the v1.0 launch milestone. It's not new features — continuous integration, a security audit, full docs, the public README polish, and the launch ritual. After M5 ships, The Brain is publicly v1.0 — open-source, MIT, single-tenant, self-hosted, same shape Memory Vault took at its own v1.0.
There's no M5 dev-log post on this dev.to series. The next post will be the v1.0 launch post itself.
Try it
git clone https://github.com/MihaiBuilds/the-brain
cd the-brain
THE_BRAIN_API_TOKEN=any-value docker compose up -d
# call any MCP server from a workflow (build your own derived image first
# with the MCP server installed — see examples/brain-with-mv-mcp/)
docker compose exec brain brain run examples/mcp_recall_memory.py
# or use per-step LLM overrides without any MCP setup
docker compose exec brain brain run examples/daily_digest.py
The repo has the full README, the derive-pattern example with a complete runbook for composing The Brain with Memory Vault, and reference workflows for both LLMStep and McpToolStep.
- GitHub — The Brain
- Memory Vault — the layer underneath
- M3 dev-log post
- M2 dev-log post
- M1 debut post
Follow along
- Twitter / X: @mihaibuilds
- Blog: mihaibuilds.com
- GitHub: github.com/MihaiBuilds/the-brain
Top comments (0)