Every serious project I've worked on with LLM agents hits the same wall eventually.
The agent is smart. It reasons well. It follows instructions. But every new session it starts from zero, with no memory of what happened before, no context from previous runs, no knowledge it built over time.
The naive fix is to stuff everything into the system prompt. It works until it doesn't — context windows fill up, costs spike, and you're manually curating what to include every time. The slightly less naive fix is RAG: retrieve relevant chunks before each call. Better, but now you have a retrieval problem on top of your agent problem, and a single shared vector store that every agent reads from indiscriminately.
I built CtxVault because I wanted something different. Not just retrieval — infrastructure.
The vault abstraction
The core idea is simple: a vault is an isolated, self-contained memory unit. Its own directory, its own vector index, its own history. You can have one per agent, one per project, or share one across multiple agents as a coordination layer. You decide the topology.
ctxvault init personal
ctxvault init work
ctxvault init shared-knowledge
Each vault is just a folder on your machine. Drop documents in, index them, and they become semantically queryable. The agent can write new content at runtime, and that content is immediately searchable — no reindexing, no manual step.
This matters more than it seems. When you have a single shared store with metadata filtering to separate "agent A's memory" from "agent B's memory", you're one misconfigured filter away from cross-contamination. Vaults make isolation structural, not configurational.
Three ways to talk to it
I wanted CtxVault to work at every level of the stack, so I built three integration modes.
CLI for humans:
ctxvault query personal "what am I learning to cook?"
ctxvault list work
HTTP API for agent pipelines — first start the server and initialize a vault — once via CLI or API, then it persists.
Now your LangChain or LangGraph agents can write and query via REST:
import requests
API = "http://127.0.0.1:8000/ctxvault"
requests.post(f"{API}/init", json={"vault_name": "agent-memory"})
requests.post(f"{API}/write", json={
"vault_name": "agent-memory",
"filename": "session.md",
"content": "User is optimizing a FastAPI service. Main bottleneck is DB connection pooling."
})
results = requests.post(f"{API}/query", json={
"vault_name": "agent-memory",
"query": "what was the performance issue we discussed?"
}).json()["results"]
MCP for no-code agent autonomy — add two lines to your mcp.json and any MCP-compatible client like Claude Desktop or Cursor gets direct vault access with no integration code:
{
"mcpServers": {
"ctxvault": {
"command": "ctxvault-mcp"
}
}
}
The agent decides autonomously when to write, when to query, when to recall. You stay in control because every vault is a directory you can inspect and edit at any time.
Why local-first
Every alternative I evaluated either requires a cloud account, sends data to an external service, or both. For personal projects, side projects, and anything involving sensitive documents, that's a non-starter.
CtxVault runs entirely on your machine. ChromaDB for vector storage, sentence-transformers for embeddings, FastAPI for the HTTP layer. No API keys, no telemetry, no vendor dependency. Install it and it works offline.
What it actually feels like
Persistent memory across sessions — shown with Claude Desktop, works with any MCP-compatible client.
The moment that made me realize the abstraction was right: I told Claude Desktop that I was learning to make fresh pasta and struggling with the sfoglia tearing when rolled thin. Closed the chat. Opened a new one. Asked "how's my pasta going?" — it knew exactly where I left off, because it had written the context to a vault and queried it when I came back.
That's the thing about memory as infrastructure: when it works, it disappears. The agent just knows. You stop re-explaining. You stop copy-pasting context between sessions. The conversation has history even when the chat window doesn't.
What's next
The current version handles retrieval well. What it doesn't handle yet is lifecycle — knowing what's worth keeping versus noise, merging stale chunks, archival. That's the next real problem and it's harder than retrieval. If you've thought about this I'd genuinely like to hear your approach.
CtxVault is open source, MIT licensed, available on PyPI.
pip install ctxvault

Top comments (2)
The vault isolation approach is clever — keeping vector indexes per-project prevents the cross-contamination problem that hits most shared RAG setups. Does the indexing stay fast once a vault has thousands of entries?
Good question — ChromaDB uses HNSW indexing which scales well, but the vault-per-domain architecture actually helps here: each vault index stays focused and small compared to a single shared store with thousands of mixed entries.
Under the hood it’s standard ChromaDB + sentence-transformers — each vault is just a separate Chroma collection, so scaling behavior is exactly what you’d expect from a classic RAG setup. The multi-vault architecture doesn’t add overhead, it just keeps indexes focused.
That said I haven’t stress-tested with thousands of entries in a single vault yet — if you push it that far let me know how it goes!