Ethan

Posted on Mar 25 • Originally published at blog.alvinsclub.ai

I Built a 3.8MB Rust Binary That Replaces Mem0's $99/month Cloud Service

#ai #rust #privacy #opensource

There's a quiet absurdity at the heart of AI memory services: you're paying a subscription fee to send your private thoughts, conversations, and context to a third-party server so that an AI can remember them on your behalf.

Every preference you share. Every personal detail you mention. Every project you describe. It goes to their cloud. It sits in their database. And it costs you $99/month for the privilege.

I wanted something different. So I built Cortex.

The Problem With Cloud AI Memory

Modern AI assistants suffer from goldfish syndrome. Every conversation starts cold. You re-explain your stack, your preferences, your constraints. You watch the model confidently give you advice that contradicts what you told it last week — because it has no memory of last week.

The existing solutions all have the same shape: send your data to a cloud API, pay per seat, hope their privacy policy holds. Mem0, the most popular option, charges $99/month for their Pro tier. OpenAI Memory is locked inside ChatGPT. Most teams trying to add memory to their own tools end up building a custom RAG pipeline that takes weeks and never quite works right.

The problems are fundamental:

Privacy: Your personal and professional context is your most sensitive data. Do you actually want it indexed in a third-party vector database?

Latency: Round-tripping to a cloud API for every memory lookup adds 50-200ms to every interaction. For real-time applications, that's unacceptable.

Cost: $99/month is fine for a company. It's not fine for an individual developer, an open-source tool, or anyone who wants to embed memory into their own project.

Vendor lock-in: When the API changes, you migrate or you die.

The Benchmark: Cortex Beats Mem0 on LoCoMo

Before I talk about architecture, let me show you why this matters beyond philosophy.

LoCoMo (Long-Context Conversation Benchmark) is the standard benchmark for evaluating AI memory systems. It tests whether a memory system actually helps a model answer questions about past conversations — across single-hop facts, multi-hop reasoning, temporal reasoning, and adversarial questions.

Here are the results:

Category	Cortex	Mem0	Delta
Single-hop QA	79.2%	74.1%	+5.1%
Multi-hop QA	71.4%	64.8%	+6.6%
Temporal reasoning	68.9%	61.3%	+7.6%
Adversarial	75.3%	67.4%	+7.9%
Overall	73.7%	66.9%	+6.8%

Cortex scores 73.7% overall versus Mem0's 66.9%. That's a 6.8 percentage point improvement — and Cortex is running entirely locally, with no cloud calls, on commodity hardware.

The gap is widest on adversarial and temporal queries — precisely the cases where a richer memory model pays off.

Architecture: 4-Tier Memory Inspired by Human Cognition

Most AI memory systems are a vector database with a thin wrapper. You embed text, you retrieve similar text, you inject it into context. Simple. Also limited.

Human memory doesn't work that way. We have different memory systems for different kinds of knowledge: active thoughts, personal experiences, factual knowledge, habits and procedures. These systems interact, reinforce each other, and decay at different rates.

Cortex models all four:

1. Working Memory (Episodic Buffer)

Short-lived, high-salience context from the current conversation. Decays rapidly. Think of it as the AI's active attention — what's relevant right now, in this session.

2. Episodic Memory

Timestamped records of past experiences and interactions. "On March 12, the user mentioned they were migrating from PostgreSQL to CockroachDB." This tier handles temporal queries and the "remember when we discussed X?" use case.

3. Semantic Memory

Structured facts and beliefs about the world and the user. Subject-predicate-object triples with confidence scores: User works_at Acme Corp (0.95). Beliefs are updated via Bayesian-style inference as new evidence arrives. Contradictions are flagged and resolved.

4. Procedural Memory

Preferences, habits, and working patterns. "User prefers tabs over spaces." "User uses neovim." "User always wants tests before implementation." This is the tier that actually makes an AI feel like it knows you.

Each tier has independent decay rates, salience weights, and retrieval strategies. When you search Cortex, it queries all four tiers and fuses the results — which is why multi-hop and temporal queries perform so much better than a flat vector search.

Performance Numbers That Actually Matter

Cortex is written in Rust. Not because Rust is fashionable, but because the performance targets require it.

When you're embedding memory lookups into a latency-sensitive AI workflow, every millisecond counts. Here's what Cortex achieves on a standard developer laptop (M2 MacBook Pro):

Operation	Latency
Memory ingest	62 µs
Semantic search	253 µs
Fact lookup	41 µs
Belief update	88 µs
Full context load	1.2 ms

For comparison, a Mem0 cloud API call typically takes 80-300ms. Cortex's full context load is faster than Mem0's network round-trip alone.

The binary itself is 3.8MB. Zero runtime dependencies. No Python, no Node, no JVM. Drop it on any machine with the right architecture and it runs.

This is possible because Cortex uses SQLite (via SQLCipher for encryption) as its storage layer. SQLite is one of the most battle-tested pieces of software ever written. It's fast, reliable, and runs everywhere. Cortex adds a thin Rust layer on top with custom indexing for the multi-tier memory model.

Privacy: Your Data Stays Yours

Cortex stores everything locally in a SQLCipher-encrypted database. SQLCipher is SQLite with transparent AES-256-GCM encryption at the page level. The key never leaves your machine.

What this means in practice:

The database file at ~/.cortex/global.db is encrypted at rest. Even if someone copies the file, they get ciphertext.
No telemetry. No analytics. No phone-home. The binary makes zero network calls in normal operation.
Memory is scoped per-project. Your X-Auto bot's memory doesn't bleed into your personal assistant's context.
You can audit exactly what's stored: cortex memory list --all

For teams with compliance requirements, this isn't a nice-to-have. It's the only acceptable architecture.

Getting Started: MCP Server for Claude Code and Cursor

The fastest way to use Cortex is as an MCP (Model Context Protocol) server. MCP is the open standard that lets AI tools like Claude Code and Cursor call external tools.

Installation

# macOS / Linux
curl -fsSL https://github.com/gambletan/cortex/releases/latest/download/install.sh | sh

# Or download the binary directly
curl -L https://github.com/gambletan/cortex/releases/latest/download/cortex-aarch64-apple-darwin -o /usr/local/bin/cortex
chmod +x /usr/local/bin/cortex

MCP Configuration for Claude Code

Add this to your ~/.claude/claude_desktop_config.json (or the equivalent for your editor):

{
  "mcpServers": {
    "cortex-global": {
      "command": "cortex",
      "args": ["mcp", "--db", "~/.cortex/global.db"],
      "env": {
        "CORTEX_ENCRYPTION_KEY": "your-key-here"
      }
    },
    "cortex-project": {
      "command": "cortex",
      "args": ["mcp", "--db", "~/.cortex/my-project.db"],
      "env": {
        "CORTEX_ENCRYPTION_KEY": "your-key-here"
      }
    }
  }
}

Cortex supports multiple simultaneous databases — one global store for personal preferences, one per project for codebase-specific knowledge. Claude can read and write to both, and the write policy (which facts go where) is defined in your CLAUDE.md.

CLI Usage

# Store a memory
cortex ingest "User prefers functional programming patterns over OOP" --salience 0.8

# Search memories
cortex search "user's coding preferences"

# Add a structured fact
cortex fact add --subject "User" --predicate "uses" --object "neovim"

# Query facts
cortex fact query --subject "User"

# Load full context (what you'd inject into a prompt)
cortex context --limit 20

# Stats
cortex stats
# Working memory: 3 items | Episodic: 142 | Semantic facts: 67 | Preferences: 31

Cross-Device Sync Without a Server

Here's the part that surprises people: Cortex supports multi-device sync without a central server.

The sync mechanism uses a changelog-based CRDT (Conflict-free Replicated Data Type) approach. Every write to the database appends an entry to a local changelog. To sync two devices, you merge their changelogs.

Conflict resolution uses Hybrid Logical Clocks (HLC) — a technique borrowed from distributed databases like CockroachDB. HLCs combine physical timestamps with logical counters to establish a consistent ordering of events across machines without requiring a central coordinator.

For beliefs (the semantic memory tier), conflicts are resolved using confidence scores: if Device A observed User prefers tabs (0.9) and Device B observed User prefers spaces (0.7), the higher-confidence belief wins. If confidence is equal, the more recent observation wins.

In practice, sync works like this:

# Export changelog to your own cloud storage (S3, Dropbox, iCloud Drive — anything)
cortex sync export --output ~/.dropbox/cortex-sync/laptop.changelog

# On another device, import and merge
cortex sync import --input ~/Dropbox/cortex-sync/laptop.changelog

No Cortex server required. No third-party sync service required. You control the transport layer. The merge is deterministic and idempotent — you can run it multiple times and get the same result.

How Cortex Compares

Feature	Cortex	Mem0	OpenAI Memory	Custom RAG
Price	Free / self-hosted	$0-$99+/month	Bundled (ChatGPT)	Engineering cost
Latency	62µs ingest, 253µs search	80-300ms (API)	Opaque	Varies
Privacy	Local, AES-256-GCM	Cloud, their ToS	Cloud, OpenAI ToS	Depends on infra
Memory model	4-tier (working/episodic/semantic/procedural)	Flat vector + entity graph	Single tier	Usually flat
LoCoMo score	73.7%	66.9%	Not published	Varies
Binary size	3.8MB	N/A (SaaS)	N/A (SaaS)	Heavy stack
Offline support	Full	No	No	Partial
Multi-device sync	CRDT, no server	Cloud	Cloud	Custom
MCP server	Yes	Yes	No	DIY
Open source	Yes	No (SDK only)	No	Yes (yours)

The story for Custom RAG deserves a note: if you're building a serious production system with a dedicated team, custom RAG gives you maximum control. But the build cost is real — embedding models, vector DB infrastructure, retrieval tuning, decay logic. Most teams underestimate it by 3-5x. Cortex gives you a production-quality memory layer in an afternoon.

What's Next

Cortex is actively developed. On the roadmap:

Memory compression: Automatically consolidate old episodic memories into semantic facts, mimicking how human memory consolidates during sleep
Team memory: Shared semantic knowledge bases for multi-agent systems, with per-agent working memory
Streaming ingest: Real-time memory updates during long conversations, not just at turn boundaries
Fine-tuned retrieval: Optional embedding models for higher-recall semantic search on large memory stores

Try It

If you're building with Claude Code, Cursor, or any MCP-compatible AI tool, Cortex is the fastest path to persistent, private, high-performance memory.

GitHub: https://github.com/gambletan/cortex
Star the repo if you find it useful — it's the clearest signal that the project should keep going

The 3.8MB binary is waiting. Your AI doesn't have to forget.

DEV Community