<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: MihaiBuilds</title>
    <description>The latest articles on DEV Community by MihaiBuilds (@mihaibuildsdev).</description>
    <link>https://dev.to/mihaibuildsdev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3904583%2F67fee610-d560-45d8-a994-78991737033d.jpeg</url>
      <title>DEV Community: MihaiBuilds</title>
      <link>https://dev.to/mihaibuildsdev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mihaibuildsdev"/>
    <language>en</language>
    <item>
      <title>Memory Vault v1.0 — building open-source AI memory the boring way</title>
      <dc:creator>MihaiBuilds</dc:creator>
      <pubDate>Sat, 09 May 2026 13:15:56 +0000</pubDate>
      <link>https://dev.to/mihaibuildsdev/memory-vault-v10-building-open-source-ai-memory-the-boring-way-33ej</link>
      <guid>https://dev.to/mihaibuildsdev/memory-vault-v10-building-open-source-ai-memory-the-boring-way-33ej</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Originally published on &lt;a href="https://mihaibuilds.com/blog/memory-vault-v1-0-released.html" rel="noopener noreferrer"&gt;mihaibuilds.com&lt;/a&gt;. Cross-posting here because dev.to is where I find a lot of this kind of work myself.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For the past year I kept hitting the same wall. I'd have a real conversation with Claude — work through a database design, debug something gnarly, agree on a convention I wanted to keep — and the next morning it was gone. Not summarized. Not searchable. Just gone. ChatGPT was the same. Every assistant I used had the long-term memory of a goldfish, and the workaround the industry settled on was "paste the relevant context back in every time." That's not memory. That's me being the memory.&lt;/p&gt;

&lt;p&gt;So I built one. Memory Vault is an open-source, self-hosted AI memory system you run yourself: Postgres with pgvector underneath, hybrid search on top, an MCP server so Claude can read and write to it directly, a knowledge graph that extracts entities without an LLM bill, a local LLM chat with retrieved-source citations, and a one-command Docker setup. Two days ago it crossed the line from "build-in-public project" to "v1.0 stable release." (v1.0.2 yesterday closed two security findings I caught after enabling branch protection — path-traversal + info-exposure on an internal stream handler.)&lt;/p&gt;

&lt;h2&gt;
  
  
  What Memory Vault is
&lt;/h2&gt;

&lt;p&gt;A long-term memory layer for AI assistants and the apps you build on top of them. You ingest text — markdown notes, conversation logs, anything plain — and it gets chunked, embedded, full-text indexed, and stored in a single Postgres database. Hybrid search (vector similarity + keyword tsvector + Reciprocal Rank Fusion) returns the right chunks back when you query. An MCP server exposes four tools (&lt;code&gt;recall&lt;/code&gt;, &lt;code&gt;remember&lt;/code&gt;, &lt;code&gt;forget&lt;/code&gt;, &lt;code&gt;status&lt;/code&gt;) that Claude Desktop or Claude Code can call directly, which means Claude can read and write to your memory inside any conversation without you copy-pasting context. A REST API exposes the same operations for any app you build. A dashboard gives you a Search, Browse, Graph, Ingest, Stats, and Chat page. A local LLM chat (LM Studio in v1.0) lets you talk to your memories with full source citations — every response shows which chunks it pulled from, clickable.&lt;/p&gt;

&lt;p&gt;It runs entirely on your machine. No API keys. No cloud. No telemetry. Postgres on port 5432, the API on port 8000, dashboard on the same port. &lt;code&gt;docker compose up&lt;/code&gt; and it's running.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fls3i6vlk1ep80b08npnf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fls3i6vlk1ep80b08npnf.png" alt="Memory Vault dashboard Chat page answering a question with the sources panel expanded showing retrieved chunks" width="800" height="686"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What v1.0 actually does
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid search&lt;/strong&gt; — pgvector HNSW for semantic + tsvector GIN for keyword + Reciprocal Rank Fusion to merge them. Vector-only search misses exact terms; keyword-only misses paraphrases. RRF gets both.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP server&lt;/strong&gt; — four tools (&lt;code&gt;recall&lt;/code&gt;, &lt;code&gt;remember&lt;/code&gt;, &lt;code&gt;forget&lt;/code&gt;, &lt;code&gt;status&lt;/code&gt;) callable from Claude Desktop, Claude Code, or any MCP client. Claude reads and writes your memory in-conversation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Knowledge graph&lt;/strong&gt; — spaCy NER plus co-occurrence extracts entities (Person, Project, Tool, Concept) and &lt;code&gt;related_to&lt;/code&gt; relationships from every ingested chunk. No LLM, no per-token cost, rendered as an interactive Cytoscape force-directed graph.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory spaces&lt;/strong&gt; — namespacing for different contexts (work, personal, projects). Per-space dedup; cross-space isolation by default.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local LLM chat&lt;/strong&gt; — LM Studio native API with sources panel showing retrieved chunks for every answer. Every response is grounded and the grounding is visible.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;REST API&lt;/strong&gt; — bearer-auth-protected, OpenAPI-documented at &lt;code&gt;/docs&lt;/code&gt;, every operation the dashboard does is also a documented endpoint.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One-command Docker&lt;/strong&gt; — &lt;code&gt;docker compose up&lt;/code&gt;. Postgres, the app, and the spaCy model bundled into a single image at build time, no first-run download.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-hosted, MIT-licensed&lt;/strong&gt; — your data stays on your machine. The whole thing is yours.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;170 tests passing&lt;/strong&gt; — pytest with a real Postgres + pgvector service container, no mocks of the database.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Architectural decisions worth naming
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Postgres + pgvector instead of a dedicated vector database.&lt;/strong&gt; I run one database, not two. Operationally this matters more than the marginal performance of a purpose-built vector store at small scale. You already know how to back up Postgres. You already know how to monitor it. HNSW indexes plus tuned &lt;code&gt;maintenance_work_mem&lt;/code&gt; and &lt;code&gt;ef_search&lt;/code&gt; get you to "fast enough for hundreds of thousands of chunks on a laptop." When that stops being true, the migration path is sane. Until then, one database is the right answer for a self-hosted personal-memory tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hybrid search instead of vector-only.&lt;/strong&gt; Pure vector search is great at paraphrase and concept. It's bad at exact terms — model names, error codes, file paths, anything where the literal string is the signal. Memory Vault stores both an embedding and a tsvector for every chunk and merges the two ranked result sets with Reciprocal Rank Fusion. RRF is parameter-free, doesn't require score normalization, and consistently beats either approach alone on the kind of mixed queries real users actually type.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;spaCy + co-occurrence for the knowledge graph, not an LLM.&lt;/strong&gt; The default move in this space is to feed every chunk through an LLM and ask it for entities and relationships. It works. It also costs money on every ingest, couples your graph quality to whichever model you happened to pick, and requires API keys for a tool whose entire pitch is no API keys. spaCy's &lt;code&gt;en_core_web_sm&lt;/code&gt; model plus a co-occurrence rule (two entities in the same chunk = a &lt;code&gt;related_to&lt;/code&gt; edge, weighted by frequency) gets you a useful graph for zero per-ingest cost. The honest limits — English only, context-dependent NER, no fuzzy matching — are documented up front rather than masked.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP-first, not REST-first.&lt;/strong&gt; Memory Vault was designed around the assumption that the primary user of this database is going to be Claude, not me. The MCP server isn't a wrapper around a REST API — it's a direct path into the same code that the REST API uses. Both are first-class. But the design starting point was "what does Claude need to call to make memory feel native," and then the REST API was the same operations exposed for human-driven apps. That ordering changes which tradeoffs are interesting.&lt;/p&gt;

&lt;h2&gt;
  
  
  The PoolClosed story
&lt;/h2&gt;

&lt;p&gt;About a week before tag day, I added a CLI command called &lt;code&gt;memory-vault diagnose&lt;/code&gt;. It bundles app logs, database logs, status output, OS info, and redacted environment into a zip file users can attach to bug reports. Foundation work. Paid for once. The kind of thing that makes every future bug report ten times higher signal-to-noise.&lt;/p&gt;

&lt;p&gt;I shipped it. Then I ran the test suite. 163 passed, 52 errored. Every error was &lt;code&gt;psycopg_pool.PoolClosed&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;First instinct: probably an &lt;code&gt;httpx&lt;/code&gt; lifespan thing. Modern &lt;code&gt;httpx&lt;/code&gt; has changed how it handles ASGI lifespan events between minor versions. The test suite uses &lt;code&gt;httpx.ASGITransport&lt;/code&gt; to drive the FastAPI app in-process, sharing a session-wide connection pool fixture. If the transport was firing shutdown events between tests, the pool would close mid-suite. There's a kwarg for this. I added &lt;code&gt;lifespan="off"&lt;/code&gt; to the transport. &lt;code&gt;TypeError: ASGITransport.__init__() got an unexpected keyword argument 'lifespan'&lt;/code&gt;. The kwarg doesn't exist in 0.28.x. Reverted.&lt;/p&gt;

&lt;p&gt;Second instinct: walk the call graph. &lt;code&gt;memory-vault diagnose&lt;/code&gt; calls into the CLI's &lt;code&gt;_run_status&lt;/code&gt; helper to capture status output for the bundle. &lt;code&gt;_run_status&lt;/code&gt; was implemented as &lt;code&gt;asyncio.run(_cmd_status())&lt;/code&gt; — directly calling the CLI's status function in-process. &lt;code&gt;_cmd_status&lt;/code&gt; initializes a connection pool at the top of the function and closes it via a &lt;code&gt;finally&lt;/code&gt; block at the end. Which is correct behavior for the CLI. It's also exactly what you don't want when something else in the same process — like a session-wide test fixture — already owns a pool that's mid-flight.&lt;/p&gt;

&lt;p&gt;The fix was four lines. Replace the in-process &lt;code&gt;asyncio.run&lt;/code&gt; with &lt;code&gt;subprocess.run(["memory-vault", "status"])&lt;/code&gt;. The subprocess gets its own pool, lives its own lifecycle, exits cleanly, and the parent process's pool is never touched. 163 passed, 0 errored.&lt;/p&gt;

&lt;p&gt;The lesson isn't about pools or fixtures specifically. It's that "obvious" fixes (changing the test transport config) and root causes (one function quietly tearing down state owned by a different function) live in different parts of the code. The &lt;code&gt;lifespan="off"&lt;/code&gt; move would have masked the symptom in the tests and left the actual bug in the CLI, where users would have hit it. Almost the entire week's gap between "all my sub-steps look done" and "v1.0 is actually shippable" was the discipline of not bypassing this kind of thing when bypassing was easy.&lt;/p&gt;

&lt;h2&gt;
  
  
  What v1.0 doesn't do, on purpose
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;English-only NER.&lt;/strong&gt; The bundled spaCy model is &lt;code&gt;en_core_web_sm&lt;/code&gt;. Non-English content gets little to no useful entity extraction. Multilingual models exist; they're heavier and slower; they're a question driven by real user demand, not a v1.0 must-have.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No fuzzy entity matching.&lt;/strong&gt; "PostgreSQL" and "Postgres" are separate entities in the graph. No alias merging in v1.0.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No re-extraction on edit.&lt;/strong&gt; If you re-ingest a corrected version of a chunk, the new entities are added but the old ones aren't cleaned up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Single-user.&lt;/strong&gt; v1.0 has bearer auth and one user behind it. The schema has &lt;code&gt;owner_id&lt;/code&gt; and &lt;code&gt;access_level&lt;/code&gt; columns from day one, but multi-user activation is part of the PRO tier.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LM Studio only for chat.&lt;/strong&gt; Ollama and llama.cpp use the same OpenAI-compatible client architecture under the hood, but the only end-to-end-tested path in v1.0 is LM Studio. Ollama support is not in v1.0.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No multi-conversation history in chat.&lt;/strong&gt; Single-thread chat. Driven by whether real users ask for it.&lt;/p&gt;

&lt;p&gt;These are deliberate trade-offs. Honest gaps documented up front build more trust than feature bullets that fall apart when someone actually tries them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The open-core model
&lt;/h2&gt;

&lt;p&gt;Memory Vault is and will always be MIT-licensed. The whole thing — search, MCP, graph, REST API, dashboard, local LLM chat, ingestion pipeline, the database schema, the Docker setup. You can run it on your machine. You can fork it. You can use it inside a commercial product. The free tier is genuinely useful — not a crippled demo of the paid tier.&lt;/p&gt;

&lt;p&gt;A paid PRO tier is on the roadmap for teams: dedup with importance decay, conflict resolution and supersede chains, multi-user activation, additional adapters (PDF, web pages), automated encrypted backups, and a fuller dashboard with analytics. The PRO tier is genuinely paid features — operational tools that solo users on a laptop don't strictly need, and teams running shared knowledge bases really do. The split is honest by design.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this took to build
&lt;/h2&gt;

&lt;p&gt;Seven weeks of evenings and weekends across nine locked milestones, scope frozen on March 27. M1 was the announcement. M2 the core hybrid search. M3 the one-command Docker. M4 the MCP server. M5 the REST API. M6 the dashboard. M7 the knowledge graph. M8 — this one — was local LLM chat plus the polish, CI/CD, security review, and release engineering that turn a build-in-public project into something other people can actually use.&lt;/p&gt;

&lt;p&gt;Two of those weeks were the kind of work nobody sees: structured JSON logging with request ID propagation, a diagnostic CLI that produces a redacted bundle for bug reports, GitHub Actions for lint and test and multi-arch Docker release, security audit (bandit, npm audit, Dependabot, CodeQL, plus a 15-test pentest pass with curl), Contributor Covenant Code of Conduct, threat model in SECURITY.md, branch protection rules, and the discipline to fix the actual root cause of a test failure instead of bypassing it. Unglamorous. Also the difference between v0.7 and v1.0.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Beyond.&lt;/strong&gt; Memory Vault is the first product in a planned compounding stack — The Brain is the next layer, building agents on top of this memory infrastructure. The memory layer is the one that has to be solid first. Today it is.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/MihaiBuilds/memory-vault
&lt;span class="nb"&gt;cd &lt;/span&gt;memory-vault
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;code&gt;http://localhost:8000&lt;/code&gt; and you're running.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/MihaiBuilds/memory-vault/releases/latest" rel="noopener noreferrer"&gt;GitHub — latest release&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/MihaiBuilds/memory-vault#readme" rel="noopener noreferrer"&gt;README and quick start&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/MihaiBuilds/memory-vault#mcp-integration" rel="noopener noreferrer"&gt;MCP setup for Claude Desktop / Claude Code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Questions and bug reports: &lt;a href="https://github.com/MihaiBuilds/memory-vault/issues" rel="noopener noreferrer"&gt;GitHub Issues&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;General discussion: &lt;a href="https://github.com/MihaiBuilds/memory-vault/discussions" rel="noopener noreferrer"&gt;GitHub Discussions&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Credits
&lt;/h2&gt;

&lt;p&gt;Three Postgres tuning tips landed during M6 and M7 that materially improved Memory Vault: &lt;a href="https://x.com/rivestack" rel="noopener noreferrer"&gt;@rivestack&lt;/a&gt; on &lt;code&gt;maintenance_work_mem&lt;/code&gt;, &lt;code&gt;ef_search&lt;/code&gt; as a runtime knob, and post-deploy cache warmup for HNSW indexes. The first ships in v1.0; we'll use the others when we get to them. Public credit, fair credit. Build-in-public works because builders with deeper expertise see what you're shipping and tell you what's wrong before production does.&lt;/p&gt;

&lt;p&gt;Beta tester Inevitable-Way-3916 ran the dashboard early, asked the architecture questions that forced the ARCHITECTURE.md doc to exist, and put bulk ingest on the list. Thanks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Follow along
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Twitter / X: &lt;a href="https://x.com/mihaibuilds" rel="noopener noreferrer"&gt;@mihaibuilds&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Blog: &lt;a href="https://mihaibuilds.com" rel="noopener noreferrer"&gt;mihaibuilds.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/MihaiBuilds/memory-vault" rel="noopener noreferrer"&gt;github.com/MihaiBuilds/memory-vault&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>opensource</category>
      <category>ai</category>
      <category>postgres</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
