This past week, MemPalace went viral on GitHub — an open-source AI memory system fronted by actress Milla Jovovich, claiming 100% on LongMemEval and 100% on LoCoMo. I was evaluating it for a production agentic AI pipeline and decided to dig into the actual code and community audits before integrating anything. Here's what I found.
To be fair, the core idea is solid. MemPalace stores your LLM conversation history locally using ChromaDB, organized into a spatial hierarchy:
Wings — people or projects
Halls — memory types
Rooms — conversation threads
Tunnels — cross-connections between memories
Instead of dumping your entire memory store to the LLM (the naive approach), it sends only the top 15 semantically relevant memories (~800 tokens). That's a claimed 250x token reduction vs. brute-force context stuffing. Fully offline, MIT-licensed, costs ~$0.70/year to run.
The spatial retrieval does measurably outperform flat ChromaDB search. The privacy-first architecture is real. This part is genuinely good work.
The Benchmark Problem
Here's where it breaks down.
LongMemEval: 100% → 96.6%
The team identified exactly which questions were failing, engineered fixes targeting those specific questions, then retested on the same dataset. Classic overfitting to a benchmark. After GitHub Issue #29 surfaced this publicly, they revised the score to 96.6% without announcement. The community caught it via commit history.
LoCoMo: 100% (trivially gamed)
They ran evaluation with top_k=50 on a dataset containing only 19–32 items. When your retrieval window exceeds the entire dataset size, you retrieve everything by default. This isn't a memory system benchmark — it's a retrieval window that swallows the whole test set whole.
Real-World Performance
One developer ran manual end-to-end tests by actually asking questions through an LLM connected to MemPalace. Correct answer rate: approximately 17%. Three independent audits reached the same conclusion: solid ChromaDB wrapper, broken marketing claims.
README vs. Codebase Table
| README Claim | Code Reality |
|---|---|
| Contradiction detection | knowledge_graph.py has zero contradiction logic |
| Palace structure drives benchmark scores | LongMemEval scores are ChromaDB's default embedding performance; palace routing sits above this |
| MCP Claude Desktop integration | stdout bug corrupts JSON stream, breaks Claude Desktop on first use |
The Crypto Context
The primary author is Ben Sigman, a crypto CEO. Milla Jovovich had 7 commits across 2 days at launch. A memecoin spawned within days of the GitHub release. Celebrity face + inflated benchmarks + viral launch + token = a pattern the community rightly recognizes. The MIT license means no software rug-pull, but the marketing playbook is straight from crypto launch culture.
How It Compares to Obsidian / Logseq
Worth noting for anyone using PKM tools: these aren't competitors, they solve different problems.
| MemPalace | Obsidian | Logseq | |
|---|---|---|---|
| Storage format | ChromaDB binary vectors | Plain Markdown | Plain Markdown |
| Human readable | No | Yes | Yes |
| Portability | Low (Python API only) | Very high | Moderate |
| Best for | LLM agent memory | Human PKM | Journaling/outlining |
The practical hybrid: use Obsidian/Logseq as your human knowledge layer, feed structured data into a vector store only for agent retrieval. Don't get locked into a binary format.
Verdict
MemPalace has a genuinely interesting spatial memory architecture. The local-first, privacy-respecting design is real. But benchmarks were manipulated, multiple advertised features don't exist in the codebase, and the launch was engineered around a celebrity and a memecoin.
Version 0.1 ChromaDB wrapper with good ideas and dishonest marketing. Revisit in 3–6 months once independent benchmark reproductions exist and the known bugs are fixed.
Top comments (0)