I built a local-first AI memory layer for LLMs in Rust (no cloud, no API keys)

#ai #opensource #llm #rust

Every LLM app has the same problem — the model forgets everything between
conversations. Cloud solutions like Mem0 exist but they send your data
to their servers. I built mnemo to solve this locally.

What it does

mnemo runs as a sidecar process next to your app. You POST text to it,
it extracts named entities and relationships using a local LLM (Ollama),
builds a persistent knowledge graph, and injects relevant context back
into your prompts automatically.

The stack

Rust — core engine, 4 crates (mnemo-core, mnemo-api, mnemo-cli, mnemo-bench)
SQLite + WAL mode — persistent storage, survives restarts
petgraph — in-memory knowledge graph with BFS traversal
Axum — REST API sidecar any app can call
Ollama — fully local LLM, zero API costs

Fully free by default

docker compose up -d
docker exec mnemo-ollama ollama pull llama3
curl http://localhost:8080/health

Works with OpenAI or Anthropic too if you bring your own key.

Python SDK

from mnemo import MnemoClient

client = MnemoClient()
client.ingest("I'm building a Rust vector database called vecdb")
print(client.get_context("what am I working on?"))

Numbers

122 Rust tests, 21 Python SDK tests
Sub-millisecond entity lookup
~4ms full retrieval pipeline (debug build)

Links

GitHub: https://github.com/zaydmulani09/mnemo

Would love feedback, especially on the retrieval scoring and graph
traversal approach.

Top comments (3)

mote • Jul 5

This is right in the space we're building in with moteDB — local-first, Rust, and the memory problem for LLMs.

Two things I'm curious about:

petgraph is an excellent choice for in-memory graph traversal, but it's fully in-memory — how do you handle a session history that grows beyond what fits comfortably in RAM? We ended up building a custom B-tree layer with mmap-backed pages specifically because petgraph's memory model broke down for us at around 100k nodes.

Also — you mention entity extraction and relationship building, but the article focuses on the graph side. How are you handling vector similarity search for semantic recall? That's where most local-first memory layers hit a wall — the graph traversal works great for structured relationships, but 'find the 5 memories most similar to this query' requires a different indexing strategy.

Curious whether you considered a hybrid approach from day one or layered vector search on top later.

Zayd Mulani • Jul 13

Both questions are the real ones.
On the petgraph scaling problem: you're right that it breaks down at scale. Current approach is to cap the in-memory graph at a configurable node limit (default 10k) with LRU eviction back to SQLite so the hot subgraph stays in petgraph and cold nodes get persisted. It's not elegant but it works for the typical use case of a single developer's session history. At 100k nodes you'd need exactly what you described — a proper mmap-backed page layer. That's not something I've tackled yet and I won't pretend otherwise.
On vector similarity vs graph traversal: mnemo does both but they're not equally mature. Entity lookup goes through the graph (fast, structured). "Find 5 semantically similar memories" goes through a cosine similarity scan over SQLite stored embeddings generated by Ollama's embedding endpoint. It works but it's a full table scan — no HNSW index, no ANN. For small corpora (under ~5k memories) it's fast enough. Beyond that it falls apart. vecdb (another project I built) has HNSW + BM25 fusion search and was always the intended backend for mnemo at scale, but I haven't wired the two together yet. That's the honest answer.
Curious about your B-tree + mmap approach what's the access pattern that made mmap worth the complexity over just SQLite with WAL?

mote • Jul 19

Appreciate the honest answer on mnemo's current limits. The 10k LRU cap makes sense for single-developer session history -- at that scale the overhead of a proper graph store isn't worth it. And I respect the "I haven't wired vecdb to mnemo yet" candor. We've all got those projects.

On mmap vs SQLite+WAL: the decision was driven by moteDB's use case, which might not apply to mnemo. We're targeting embedded/robotics where the database runs on the same device as the application, often with no filesystem abstraction. SQLite's WAL journal doubles your I/O and the checkpointing logic assumes a POSIX filesystem. mmap gives us direct page access with the OS page cache doing the heavy lifting -- dirty pages get flushed lazily, clean pages get evicted under memory pressure, no additional logging layer.

The tradeoff is that mmap is fragile. SIGBUS on truncated files, no built-in crash recovery. We built that ourselves (write-ahead log at the page level, not the SQL level). For mnemo running on a developer laptop, I'd honestly stick with SQLite+WAL. The complexity isn't worth it until you're on bare metal with a RAM budget measured in megabytes.

Your vecdb HNSW+BM25 fusion is interesting. We went with DiskANN for moteDB -- smaller index, faster build times, and the graph construction is parallel. HNSW's build phase is single-threaded and RAM-bound. For offline/embedded use that matters more than recall@1 differences. What made you pick HNSW over DiskANN?