Every serious project I've worked on with LLM agents hits the same wall eventually.
The agent is smart. It reasons well. It follows instructions. But every new session it starts from zero, with no memory of what happened before, no context from previous runs, no knowledge it built over time.
The naive fix is to stuff everything into the system prompt. It works until it doesn't — context windows fill up, costs spike, and you're manually curating what to include every time. The slightly less naive fix is RAG: retrieve relevant chunks before each call. Better, but now you have a retrieval problem on top of your agent problem, and a single shared vector store that every agent reads from indiscriminately.
I built CtxVault because I wanted something different. Not just retrieval — infrastructure.
The vault abstraction
The core idea is simple: a vault is an isolated, self-contained memory unit. Its own directory, its own vector index, its own history. You can have one per agent, one per project, or share one across multiple agents as a coordination layer. You decide the topology.
ctxvault init personal
ctxvault init work
ctxvault init shared-knowledge
Each vault is just a folder on your machine. Drop documents in, index them, and they become semantically queryable. The agent can write new content at runtime, and that content is immediately searchable — no reindexing, no manual step.
This matters more than it seems. When you have a single shared store with metadata filtering to separate "agent A's memory" from "agent B's memory", you're one misconfigured filter away from cross-contamination. Vaults make isolation structural, not configurational.
Three ways to talk to it
I wanted CtxVault to work at every level of the stack, so I built three integration modes.
CLI for humans:
ctxvault query personal "what am I learning to cook?"
ctxvault list work
HTTP API for agent pipelines — first start the server and initialize a vault — once via CLI or API, then it persists.
Now your LangChain or LangGraph agents can write and query via REST:
import requests
API = "http://127.0.0.1:8000/ctxvault"
requests.post(f"{API}/init", json={"vault_name": "agent-memory"})
requests.post(f"{API}/write", json={
"vault_name": "agent-memory",
"filename": "session.md",
"content": "User is optimizing a FastAPI service. Main bottleneck is DB connection pooling."
})
results = requests.post(f"{API}/query", json={
"vault_name": "agent-memory",
"query": "what was the performance issue we discussed?"
}).json()["results"]
MCP for no-code agent autonomy — add two lines to your mcp.json and any MCP-compatible client like Claude Desktop or Cursor gets direct vault access with no integration code:
{
"mcpServers": {
"ctxvault": {
"command": "ctxvault-mcp"
}
}
}
The agent decides autonomously when to write, when to query, when to recall. You stay in control because every vault is a directory you can inspect and edit at any time.
Why local-first
Every alternative I evaluated either requires a cloud account, sends data to an external service, or both. For personal projects, side projects, and anything involving sensitive documents, that's a non-starter.
CtxVault runs entirely on your machine. ChromaDB for vector storage, sentence-transformers for embeddings, FastAPI for the HTTP layer. No API keys, no telemetry, no vendor dependency. Install it and it works offline.
What it actually feels like
Persistent memory across sessions — shown with Claude Desktop, works with any MCP-compatible client.
The moment that made me realize the abstraction was right: I told Claude Desktop that I was learning to make fresh pasta and struggling with the sfoglia tearing when rolled thin. Closed the chat. Opened a new one. Asked "how's my pasta going?" — it knew exactly where I left off, because it had written the context to a vault and queried it when I came back.
That's the thing about memory as infrastructure: when it works, it disappears. The agent just knows. You stop re-explaining. You stop copy-pasting context between sessions. The conversation has history even when the chat window doesn't.
What's next
The current version handles retrieval well. What it doesn't handle yet is lifecycle — knowing what's worth keeping versus noise, merging stale chunks, archival. That's the next real problem and it's harder than retrieval. If you've thought about this I'd genuinely like to hear your approach.
CtxVault is open source, MIT licensed, available on PyPI.
pip install ctxvault

Top comments (8)
The vault concept is genuinely interesting! Scoping memory by context domain is something most memory systems still get wrong. We've been exploring a different dimension of this problem with Neocortex: instead of organizing where memories live, we focus on which memories deserve to survive. Using interaction signals (views, recalls, usage frequency) to let low-value memories decay naturally. Your vault design and our decay model could actually be complementary. Lets connect!
Thanks! The decay angle is interesting, that's exactly the lifecycle problem I've deliberately left open in CtxVault for now. I focused on the isolation and access control layer first, but knowing which memories deserve to survive is the harder long-term problem. The two approaches do seem complementary, vault isolation handles the 'who sees what' question, decay handles the 'what's still worth keeping' question. Happy to connect and dig into this more.
Exactly how I'd frame it too, isolation and decay are orthogonal problems. You solved the first one cleanly with vaults, we've been deep in the second one!
If you want to poke at what we've built on the decay side, the repo is at github.com/tinyhumansai/neocortex would genuinely value your perspective on it given where you've landed with CtxVault. And if it resonates, a star would mean a lot to an early open source project 🙏
Starred! Early OSS solidarity. Would genuinely love the same if CtxVault resonates: github.com/Filippo-Venturini/ctxvault looking forward to digging into the decay model
DONE! its looking pretty cool!
The vault isolation approach is clever — keeping vector indexes per-project prevents the cross-contamination problem that hits most shared RAG setups. Does the indexing stay fast once a vault has thousands of entries?
Good question — ChromaDB uses HNSW indexing which scales well, but the vault-per-domain architecture actually helps here: each vault index stays focused and small compared to a single shared store with thousands of mixed entries.
Under the hood it’s standard ChromaDB + sentence-transformers — each vault is just a separate Chroma collection, so scaling behavior is exactly what you’d expect from a classic RAG setup. The multi-vault architecture doesn’t add overhead, it just keeps indexes focused.
That said I haven’t stress-tested with thousands of entries in a single vault yet — if you push it that far let me know how it goes!
Some comments may only be visible to logged-in visitors. Sign in to view all comments.