I Read Every Line of Milla Jovovich's MemPalace — 7,600 Lines of Python, 30K Stars

#ai #python #machinelearning #opensource

Milla Jovovich -- yes, the Resident Evil actress -- released an AI memory system called MemPalace. Four days later it had 30,000 GitHub stars. I read the entire source code.

Repo: milla-jovovich/mempalace

The Numbers

The core mempalace/ directory is 22 Python files, 7,625 lines. For a 30k-star project, that's compact. For comparison, OpenHands (another AI memory project) has 287,000 lines of Python.

This isn't criticism -- small codebases can be good. But the star-to-code ratio here is the highest I've encountered in the AI agent space.

What's Actually Good: The 4-Layer Memory Stack

layers.py (515 lines) is the real contribution:

Layer 0: Identity (~100 tokens) -- always loaded. Who am I, who do I work with.
Layer 1: Essential Story (~500-800 tokens) -- always loaded. Top moments from all conversations.
Layer 2: On-Demand (~200-500 tokens per topic) -- loaded when a specific project comes up.
Layer 3: Deep Search (unlimited) -- ChromaDB semantic search on demand.

Wake-up cost is ~600 tokens. Six months of daily AI use produces ~19.5M tokens of conversations. MemPalace loads 170 tokens to start. That's a good design pattern.

What's Oversold: The AAAK Dialect

AAAK (dialect.py, 952 lines) is the single largest module and the biggest controversy.

The original README claimed "30x lossless compression." The community debunked this within 48 hours:

Real tokenizer shows AAAK uses MORE tokens than raw text (73 vs 66)
On LongMemEval benchmark, AAAK mode scores 84.2% vs raw mode's 96.6% -- a 12.4-point regression

The creators published an honest correction acknowledging these issues. Credit where it's due -- that correction is the most credible thing in the repo.

Security Issue Nobody Mentioned

The hooks system has a shell injection path. In the precompact hook, SESSION_ID gets passed through shell expansion before sanitization. A crafted session ID could execute arbitrary commands.

Not critical for local-only use, but for a project that markets itself as "secure local-only," this needs fixing (tracked as issue #110).

How Simple Is the Search?

searcher.py is 152 lines. It calls ChromaDB.query() with optional metadata filters (wing and room). The "+34% retrieval boost" from the README is standard ChromaDB where clause filtering -- any ChromaDB user gets this for free.

The Verdict

MemPalace isn't a scam. The 4-layer memory stack is a useful design pattern. ChromaDB-based local storage works. Zero external API dependencies is a real advantage.

But the README oversells the code. AAAK compression doesn't compress. The benchmark numbers need asterisks. The "+34% boost" is a standard database feature.

If you only read one file, read layers.py. Those 515 lines are worth more than the marketing claims.