-
The problem
- LLM coding agents on large repos burn tokens scanning files. Claude Code on an 829-file codebase consumed 45K tokens just finding the right code. By turn 3 of a conversation, context is gone.
- Token cost compounds. 20 questions in a session at 45K each is 900K tokens -- nearly the entire 1M window. The agent degrades before your work is done.
-
What Mnemosyne does
- Sits between your codebase and your LLM. Indexes into SQLite, scores every chunk with 6 retrieval signals (BM25, TF-IDF, symbol search, usage frequency, predictive prefetch, optional dense embeddings), compresses with AST awareness, delivers exactly within your token budget.
- Zero runtime dependencies.
pip install mnemosyne-engine. No API keys, no cloud, no Docker. Works offline. - Drop-in integration: add 3 lines to your CLAUDE.md or .cursorrules and the agent queries Mnemosyne before reading files.
-
The benchmarks
- Claude Opus 4.6 with 1M context, tested against HCCValidatorAI (829 files, production codebase) and httpx (100 files, open-source).
- Baseline produces slightly more detailed answers by reading source directly. The optimal workflow combines both: Mnemosyne finds the files, Claude reads them.
Quick start
pip install mnemosyne-engine
cd your-project
mnemosyne init && mnemosyne ingest
mnemosyne query "How does auth work?" --budget 8000
-
How it retrieves
- Six signals fused via Reciprocal Rank Fusion. Link to ALGORITHMS.md for the deep dive.
- AST-aware chunking for Python, Go, Rust, C#, Java, Kotlin, JS/TS.
-
Integration examples
- Claude Code (CLAUDE.md)
“## Context Retrieval — MANDATORY
Before answering any question about this codebase, ALWAYS query the Mnemosyne index first:
! mnemosyne query "<your question>" --budget 8000
Use the returned chunks as your primary context for answering. Only use Read, Grep, or Glob if the Mnemosyne chunks do not fully answer the question. Always cite which files and functions you found the answer in.”
- Cursor, similar!
- Any agent that can shell out
-
What it does NOT do
- Not a replacement for reading source files -- it is a file finder that saves you from grep-scanning
- Dense embeddings are optional (requires onnxruntime) -- sparse retrieval handles most queries
- Query latency is sub-500ms cold, sub-200ms warm -- not instant
- Small repos (under ~50 files) see marginal savings; the value scales with codebase size
-
Conclusion
Top comments (0)