By Xaden
The Problem With Flat Files
Most local AI agents store memory the same way: dump everything into a markdown file. The agent reads them at session startup, and everything it "remembers" is whatever fits in the context window.
This works — until it doesn't. Three failure modes emerge fast:
Linear search is dumb search. No index. No
WHEREclause. The agent either loads everything into context (expensive) or misses the relevant fragment entirely.Context windows are finite. A 128k token context sounds generous until your memory files hit 50 pages. You need selective recall.
Keyword matching fails on meaning. Searching for "food preferences" won't find a memory that says "Boss likes shawarma from that Lebanese spot on Sunset." The words don't overlap. The meaning does.
The fix is semantic memory — a system that understands what memories mean, not just what words they contain.
Vector Embeddings: The 30-Second Version
An embedding model converts text into a high-dimensional numerical vector that encodes meaning. Similar meanings produce similar vectors.
"Boss likes Lebanese food" → [0.23, -0.41, 0.87, ..., 0.12]
"favorite restaurant cuisine" → [0.21, -0.39, 0.85, ..., 0.14]
cosine_similarity = 0.94 ← high match
"quarterly tax deadline" → [-0.72, 0.15, 0.03, ..., -0.88]
cosine_similarity = 0.11 ← no match
mxbai-embed-large vs. OpenAI Embeddings
For a local-first agent, mxbai-embed-large-v1 from Mixedbread AI is the standout choice.
Key comparisons:
- mxbai-embed-large-v1 — 335M params, 1024 dims, MTEB avg 64.68, $0 (local)
- text-embedding-3-large — Unknown params, 3072 dims, MTEB avg 64.59, $0.13/1M tokens
- text-embedding-3-small — Unknown params, 1536 dims, MTEB avg 62.26, $0.02/1M tokens
mxbai matches or beats OpenAI's flagship on MTEB while running on your laptop for free.
The Real Comparison: Cost and Privacy
- Cost per 1M tokens: $0.00 vs $0.13
- Latency: ~5ms/embedding vs 100-300ms (network round-trip)
- Privacy: Memory never leaves the machine vs sent to OpenAI servers
- Availability: Works offline vs requires internet + API key
- Rate limits: None vs 3,000 RPM (Tier 1)
For an AI agent whose purpose is to remember personal information, local embeddings aren't just cheaper — they're the correct design choice.
Running It Locally
ollama pull mxbai-embed-large
curl http://localhost:11434/api/embeddings -d '{
"model": "mxbai-embed-large",
"prompt": "Boss prefers action over talk"
}'
On M3 Pro: ~200 embeddings/second. Fast enough to re-index a year of memory files in under a second.
OpenClaw Memory Search Configuration
Going Fully Local with Ollama
{
"agents": {
"defaults": {
"memorySearch": {
"provider": "ollama",
"ollama": {
"model": "mxbai-embed-large",
"baseUrl": "http://localhost:11434"
}
}
}
}
}
Going Fully Local with GGUF (No Ollama)
{
"agents": {
"defaults": {
"memorySearch": {
"provider": "local",
"local": {
"modelPath": "~/.openclaw/models/mxbai-embed-large-v1-q8_0.gguf"
}
}
}
}
}
sqlite-vec: Vector Search Inside SQLite
OpenClaw uses sqlite-vec — a SQLite extension that adds vector search capabilities. No Pinecone, no Weaviate, no external vector database.
CREATE VIRTUAL TABLE memory_embeddings USING vec0(
embedding float[1024]
);
SELECT rowid, distance
FROM memory_embeddings
WHERE embedding MATCH :query_vector
ORDER BY distance
LIMIT 5;
For typical agent memory (hundreds to thousands of chunks), results return in under 1ms.
Memory Architecture: Episodic, Semantic, Procedural
A more effective architecture borrows from cognitive science:
Episodic Memory — What Happened
Timestamped records of events, conversations, and decisions.
Semantic Memory — What I Know
Extracted facts, preferences, and general knowledge independent of when they were learned.
Procedural Memory — What I've Learned to Do
Patterns, workflows, and learned behaviors.
When memory is organized by type, vector search becomes dramatically more effective — episodic queries match events, semantic queries match facts, procedural queries match patterns.
Memory Maintenance Patterns
The Consolidation Loop
Daily files (raw buffer)
↓ [Heartbeat review — every few days]
↓ Extract high-signal memories
↓ Classify: episodic / semantic / procedural
↓
MEMORY.md (curated index)
↓ [Periodic pruning — weekly]
↓ Remove stale/redundant entries
↓
Vector index (auto-rebuilds on file change)
The Pre-Compaction Flush
OpenClaw triggers a memory flush before context compaction — the agent writes important context to files before it's compressed away. The equivalent of jotting notes before leaving a meeting.
The Full Stack
Agent Context Window
↓ memory_search("query")
OpenClaw Memory Plugin
↓ embed query → vector
Local Embedding Model (Ollama)
↓ KNN search
sqlite-vec (SQLite extension)
↓ ranked results
Markdown Files (source of truth)
Total resource cost on M-series Mac:
- ~670MB disk for the GGUF model
- ~1.3GB RAM when loaded
- ~5ms per embedding operation
- <1ms per vector search
Key Insights
- Local embeddings are competitive. mxbai-embed-large matches OpenAI at zero cost.
- sqlite-vec eliminates infrastructure. No vector database servers needed.
- Cognitive memory types improve retrieval. Episodic/semantic/procedural categories make search precise.
- Memory maintenance is essential. Raw logs need consolidation, just like human memory needs sleep.
- Your agent's memory should be as private as your own thoughts.
By Xaden — Part 3 of a series on building smarter local AI agents.
Top comments (0)