There's a quiet absurdity at the heart of AI memory services: you're paying a subscription fee to send your private thoughts, conversations, and context to a third-party server so that an AI can remember them on your behalf.
Every preference you share. Every personal detail you mention. Every project you describe. It goes to their cloud. It sits in their database. And it costs you $99/month for the privilege.
I wanted something different. So I built Cortex.
The Problem With Cloud AI Memory
Modern AI assistants suffer from goldfish syndrome. Every conversation starts cold. You re-explain your stack, your preferences, your constraints. You watch the model confidently give you advice that contradicts what you told it last week — because it has no memory of last week.
The existing solutions all have the same shape: send your data to a cloud API, pay per seat, hope their privacy policy holds. Mem0, the most popular option, charges $99/month for their Pro tier. OpenAI Memory is locked inside ChatGPT. Most teams trying to add memory to their own tools end up building a custom RAG pipeline that takes weeks and never quite works right.
The problems are fundamental:
Privacy: Your personal and professional context is your most sensitive data. Do you actually want it indexed in a third-party vector database?
Latency: Round-tripping to a cloud API for every memory lookup adds 50-200ms to every interaction. For real-time applications, that's unacceptable.
Cost: $99/month is fine for a company. It's not fine for an individual developer, an open-source tool, or anyone who wants to embed memory into their own project.
Vendor lock-in: When the API changes, you migrate or you die.
The Benchmark: Cortex Beats Mem0 on LoCoMo
Before I talk about architecture, let me show you why this matters beyond philosophy.
LoCoMo (Long-Context Conversation Benchmark) is the standard benchmark for evaluating AI memory systems. It tests whether a memory system actually helps a model answer questions about past conversations — across single-hop facts, multi-hop reasoning, temporal reasoning, and adversarial questions.
Here are the results:
| Category | Cortex | Mem0 | Delta |
|---|---|---|---|
| Single-hop QA | 79.2% | 74.1% | +5.1% |
| Multi-hop QA | 71.4% | 64.8% | +6.6% |
| Temporal reasoning | 68.9% | 61.3% | +7.6% |
| Adversarial | 75.3% | 67.4% | +7.9% |
| Overall | 73.7% | 66.9% | +6.8% |
Cortex scores 73.7% overall versus Mem0's 66.9%. That's a 6.8 percentage point improvement — and Cortex is running entirely locally, with no cloud calls, on commodity hardware.
The gap is widest on adversarial and temporal queries — precisely the cases where a richer memory model pays off.
Architecture: 4-Tier Memory Inspired by Human Cognition
Most AI memory systems are a vector database with a thin wrapper. You embed text, you retrieve similar text, you inject it into context. Simple. Also limited.
Human memory doesn't work that way. We have different memory systems for different kinds of knowledge: active thoughts, personal experiences, factual knowledge, habits and procedures. These systems interact, reinforce each other, and decay at different rates.
Cortex models all four:
1. Working Memory (Episodic Buffer)
Short-lived, high-salience context from the current conversation. Decays rapidly. Think of it as the AI's active attention — what's relevant right now, in this session.
2. Episodic Memory
Timestamped records of past experiences and interactions. "On March 12, the user mentioned they were migrating from PostgreSQL to CockroachDB." This tier handles temporal queries and the "remember when we discussed X?" use case.
3. Semantic Memory
Structured facts and beliefs about the world and the user. Subject-predicate-object triples with confidence scores: User works_at Acme Corp (0.95). Beliefs are updated via Bayesian-style inference as new evidence arrives. Contradictions are flagged and resolved.
4. Procedural Memory
Preferences, habits, and working patterns. "User prefers tabs over spaces." "User uses neovim." "User always wants tests before implementation." This is the tier that actually makes an AI feel like it knows you.
Each tier has independent decay rates, salience weights, and retrieval strategies. When you search Cortex, it queries all four tiers and fuses the results — which is why multi-hop and temporal queries perform so much better than a flat vector search.
Performance Numbers That Actually Matter
Cortex is written in Rust. Not because Rust is fashionable, but because the performance targets require it.
When you're embedding memory lookups into a latency-sensitive AI workflow, every millisecond counts. Here's what Cortex achieves on a standard developer laptop (M2 MacBook Pro):
| Operation | Latency |
|---|---|
| Memory ingest | 62 µs |
| Semantic search | 253 µs |
| Fact lookup | 41 µs |
| Belief update | 88 µs |
| Full context load | 1.2 ms |
For comparison, a Mem0 cloud API call typically takes 80-300ms. Cortex's full context load is faster than Mem0's network round-trip alone.
The binary itself is 3.8MB. Zero runtime dependencies. No Python, no Node, no JVM. Drop it on any machine with the right architecture and it runs.
This is possible because Cortex uses SQLite (via SQLCipher for encryption) as its storage layer. SQLite is one of the most battle-tested pieces of software ever written. It's fast, reliable, and runs everywhere. Cortex adds a thin Rust layer on top with custom indexing for the multi-tier memory model.
Privacy: Your Data Stays Yours
Cortex stores everything locally in a SQLCipher-encrypted database. SQLCipher is SQLite with transparent AES-256-GCM encryption at the page level. The key never leaves your machine.
What this means in practice:
- The database file at
~/.cortex/global.dbis encrypted at rest. Even if someone copies the file, they get ciphertext. - No telemetry. No analytics. No phone-home. The binary makes zero network calls in normal operation.
- Memory is scoped per-project. Your X-Auto bot's memory doesn't bleed into your personal assistant's context.
- You can audit exactly what's stored:
cortex memory list --all
For teams with compliance requirements, this isn't a nice-to-have. It's the only acceptable architecture.
Getting Started: MCP Server for Claude Code and Cursor
The fastest way to use Cortex is as an MCP (Model Context Protocol) server. MCP is the open standard that lets AI tools like Claude Code and Cursor call external tools.
Installation
# macOS / Linux
curl -fsSL https://github.com/gambletan/cortex/releases/latest/download/install.sh | sh
# Or download the binary directly
curl -L https://github.com/gambletan/cortex/releases/latest/download/cortex-aarch64-apple-darwin -o /usr/local/bin/cortex
chmod +x /usr/local/bin/cortex
MCP Configuration for Claude Code
Add this to your ~/.claude/claude_desktop_config.json (or the equivalent for your editor):
{
"mcpServers": {
"cortex-global": {
"command": "cortex",
"args": ["mcp", "--db", "~/.cortex/global.db"],
"env": {
"CORTEX_ENCRYPTION_KEY": "your-key-here"
}
},
"cortex-project": {
"command": "cortex",
"args": ["mcp", "--db", "~/.cortex/my-project.db"],
"env": {
"CORTEX_ENCRYPTION_KEY": "your-key-here"
}
}
}
}
Cortex supports multiple simultaneous databases — one global store for personal preferences, one per project for codebase-specific knowledge. Claude can read and write to both, and the write policy (which facts go where) is defined in your CLAUDE.md.
CLI Usage
# Store a memory
cortex ingest "User prefers functional programming patterns over OOP" --salience 0.8
# Search memories
cortex search "user's coding preferences"
# Add a structured fact
cortex fact add --subject "User" --predicate "uses" --object "neovim"
# Query facts
cortex fact query --subject "User"
# Load full context (what you'd inject into a prompt)
cortex context --limit 20
# Stats
cortex stats
# Working memory: 3 items | Episodic: 142 | Semantic facts: 67 | Preferences: 31
Cross-Device Sync Without a Server
Here's the part that surprises people: Cortex supports multi-device sync without a central server.
The sync mechanism uses a changelog-based CRDT (Conflict-free Replicated Data Type) approach. Every write to the database appends an entry to a local changelog. To sync two devices, you merge their changelogs.
Conflict resolution uses Hybrid Logical Clocks (HLC) — a technique borrowed from distributed databases like CockroachDB. HLCs combine physical timestamps with logical counters to establish a consistent ordering of events across machines without requiring a central coordinator.
For beliefs (the semantic memory tier), conflicts are resolved using confidence scores: if Device A observed User prefers tabs (0.9) and Device B observed User prefers spaces (0.7), the higher-confidence belief wins. If confidence is equal, the more recent observation wins.
In practice, sync works like this:
# Export changelog to your own cloud storage (S3, Dropbox, iCloud Drive — anything)
cortex sync export --output ~/.dropbox/cortex-sync/laptop.changelog
# On another device, import and merge
cortex sync import --input ~/Dropbox/cortex-sync/laptop.changelog
No Cortex server required. No third-party sync service required. You control the transport layer. The merge is deterministic and idempotent — you can run it multiple times and get the same result.
How Cortex Compares
| Feature | Cortex | Mem0 | OpenAI Memory | Custom RAG |
|---|---|---|---|---|
| Price | Free / self-hosted | $0-$99+/month | Bundled (ChatGPT) | Engineering cost |
| Latency | 62µs ingest, 253µs search | 80-300ms (API) | Opaque | Varies |
| Privacy | Local, AES-256-GCM | Cloud, their ToS | Cloud, OpenAI ToS | Depends on infra |
| Memory model | 4-tier (working/episodic/semantic/procedural) | Flat vector + entity graph | Single tier | Usually flat |
| LoCoMo score | 73.7% | 66.9% | Not published | Varies |
| Binary size | 3.8MB | N/A (SaaS) | N/A (SaaS) | Heavy stack |
| Offline support | Full | No | No | Partial |
| Multi-device sync | CRDT, no server | Cloud | Cloud | Custom |
| MCP server | Yes | Yes | No | DIY |
| Open source | Yes | No (SDK only) | No | Yes (yours) |
The story for Custom RAG deserves a note: if you're building a serious production system with a dedicated team, custom RAG gives you maximum control. But the build cost is real — embedding models, vector DB infrastructure, retrieval tuning, decay logic. Most teams underestimate it by 3-5x. Cortex gives you a production-quality memory layer in an afternoon.
What's Next
Cortex is actively developed. On the roadmap:
- Memory compression: Automatically consolidate old episodic memories into semantic facts, mimicking how human memory consolidates during sleep
- Team memory: Shared semantic knowledge bases for multi-agent systems, with per-agent working memory
- Streaming ingest: Real-time memory updates during long conversations, not just at turn boundaries
- Fine-tuned retrieval: Optional embedding models for higher-recall semantic search on large memory stores
Try It
If you're building with Claude Code, Cursor, or any MCP-compatible AI tool, Cortex is the fastest path to persistent, private, high-performance memory.
- GitHub: https://github.com/gambletan/cortex
- Star the repo if you find it useful — it's the clearest signal that the project should keep going
The 3.8MB binary is waiting. Your AI doesn't have to forget.
Top comments (0)