Filippo Venturini

Posted on Mar 2

How I built a local memory layer for AI agents — and why vaults changed everything

#ai #python #opensource #machinelearning

Every serious project I've worked on with LLM agents hits the same wall eventually.

The agent is smart. It reasons well. It follows instructions. But every new session it starts from zero, with no memory of what happened before, no context from previous runs, no knowledge it built over time.

The naive fix is to stuff everything into the system prompt. It works until it doesn't — context windows fill up, costs spike, and you're manually curating what to include every time. The slightly less naive fix is RAG: retrieve relevant chunks before each call. Better, but now you have a retrieval problem on top of your agent problem, and a single shared vector store that every agent reads from indiscriminately.

I built CtxVault because I wanted something different. Not just retrieval — infrastructure.

The vault abstraction

The core idea is simple: a vault is an isolated, self-contained memory unit. Its own directory, its own vector index, its own history. You can have one per agent, one per project, or share one across multiple agents as a coordination layer. You decide the topology.

ctxvault init personal
ctxvault init work
ctxvault init shared-knowledge

Each vault is just a folder on your machine. Drop documents in, index them, and they become semantically queryable. The agent can write new content at runtime, and that content is immediately searchable — no reindexing, no manual step.

This matters more than it seems. When you have a single shared store with metadata filtering to separate "agent A's memory" from "agent B's memory", you're one misconfigured filter away from cross-contamination. Vaults make isolation structural, not configurational.

Three ways to talk to it

I wanted CtxVault to work at every level of the stack, so I built three integration modes.

CLI for humans:

ctxvault query personal "what am I learning to cook?"
ctxvault list work

HTTP API for agent pipelines — first start the server and initialize a vault — once via CLI or API, then it persists.

Now your LangChain or LangGraph agents can write and query via REST:

import requests

API = "http://127.0.0.1:8000/ctxvault"

requests.post(f"{API}/init", json={"vault_name": "agent-memory"})

requests.post(f"{API}/write", json={
    "vault_name": "agent-memory",
    "filename": "session.md",
    "content": "User is optimizing a FastAPI service. Main bottleneck is DB connection pooling."
})

results = requests.post(f"{API}/query", json={
    "vault_name": "agent-memory",
    "query": "what was the performance issue we discussed?"
}).json()["results"]

MCP for no-code agent autonomy — add two lines to your mcp.json and any MCP-compatible client like Claude Desktop or Cursor gets direct vault access with no integration code:

{
  "mcpServers": {
    "ctxvault": {
      "command": "ctxvault-mcp"
    }
  }
}

The agent decides autonomously when to write, when to query, when to recall. You stay in control because every vault is a directory you can inspect and edit at any time.

Why local-first

Every alternative I evaluated either requires a cloud account, sends data to an external service, or both. For personal projects, side projects, and anything involving sensitive documents, that's a non-starter.

CtxVault runs entirely on your machine. ChromaDB for vector storage, sentence-transformers for embeddings, FastAPI for the HTTP layer. No API keys, no telemetry, no vendor dependency. Install it and it works offline.

What it actually feels like

_{Persistent memory across sessions — shown with Claude Desktop, works with any MCP-compatible client.}

The moment that made me realize the abstraction was right: I told Claude Desktop that I was learning to make fresh pasta and struggling with the sfoglia tearing when rolled thin. Closed the chat. Opened a new one. Asked "how's my pasta going?" — it knew exactly where I left off, because it had written the context to a vault and queried it when I came back.

That's the thing about memory as infrastructure: when it works, it disappears. The agent just knows. You stop re-explaining. You stop copy-pasting context between sessions. The conversation has history even when the chat window doesn't.

What's next

The current version handles retrieval well. What it doesn't handle yet is lifecycle — knowing what's worth keeping versus noise, merging stale chunks, archival. That's the next real problem and it's harder than retrieval. If you've thought about this I'd genuinely like to hear your approach.

CtxVault is open source, MIT licensed, available on PyPI.

pip install ctxvault

GitHub: github.com/Filippo-Venturini/ctxvault

Top comments (9)

Neo • Mar 10

The vault concept is genuinely interesting! Scoping memory by context domain is something most memory systems still get wrong. We've been exploring a different dimension of this problem with Neocortex: instead of organizing where memories live, we focus on which memories deserve to survive. Using interaction signals (views, recalls, usage frequency) to let low-value memories decay naturally. Your vault design and our decay model could actually be complementary. Lets connect!

Filippo Venturini • Mar 12

Thanks! The decay angle is interesting, that's exactly the lifecycle problem I've deliberately left open in CtxVault for now. I focused on the isolation and access control layer first, but knowing which memories deserve to survive is the harder long-term problem. The two approaches do seem complementary, vault isolation handles the 'who sees what' question, decay handles the 'what's still worth keeping' question. Happy to connect and dig into this more.

Neo • Mar 12

Exactly how I'd frame it too, isolation and decay are orthogonal problems. You solved the first one cleanly with vaults, we've been deep in the second one!
If you want to poke at what we've built on the decay side, the repo is at github.com/tinyhumansai/neocortex would genuinely value your perspective on it given where you've landed with CtxVault. And if it resonates, a star would mean a lot to an early open source project 🙏

Filippo Venturini • Mar 12

Starred! Early OSS solidarity. Would genuinely love the same if CtxVault resonates: github.com/Filippo-Venturini/ctxvault looking forward to digging into the decay model

Neo • Mar 13

DONE! its looking pretty cool!

Harjot Singh • May 31

Vaults as the organizing unit for memory is a smart abstraction - it gives you natural scoping boundaries (this project's memory vs that client's vs personal), which solves the two problems at once: retrieval stays relevant because you're not searching one giant undifferentiated blob, and privacy/isolation falls out for free. Local + vaulted is a strong combo.

The scoping is the real win you're pointing at: an agent should query the right vault, not all memory, which keeps both quality and token cost down (searching everything is how you reintroduce noise). That "recall the relevant slice, not the world" discipline is the same thing I lean on in Moonshift (prompt to a shipped SaaS on your own GitHub+Vercel). Nice design; how does the agent decide which vault(s) to pull from - explicit scoping or inferred from the task? (Moonshift's first run's free if useful.)

klement Gunndu • Mar 2

The vault isolation approach is clever — keeping vector indexes per-project prevents the cross-contamination problem that hits most shared RAG setups. Does the indexing stay fast once a vault has thousands of entries?

Filippo Venturini • Mar 2

Good question — ChromaDB uses HNSW indexing which scales well, but the vault-per-domain architecture actually helps here: each vault index stays focused and small compared to a single shared store with thousands of mixed entries.

Under the hood it’s standard ChromaDB + sentence-transformers — each vault is just a separate Chroma collection, so scaling behavior is exactly what you’d expect from a classic RAG setup. The multi-vault architecture doesn’t add overhead, it just keeps indexes focused.

That said I haven’t stress-tested with thousands of entries in a single vault yet — if you push it that far let me know how it goes!

Some comments may only be visible to logged-in visitors. Sign in to view all comments.