DEV Community

Cover image for "Stop Treating All AI Memories the Same — Introducing Cortex, Who Forgot?"
Travis Cole
Travis Cole

Posted on

"Stop Treating All AI Memories the Same — Introducing Cortex, Who Forgot?"

A quick fact ("PostgreSQL runs on port 5432") is not the same as a learned pattern ("always use connection pooling for high-traffic services").

A deployment event is not the same as a user preference.

So why do most memory systems treat them identically?

The Problem with Flat Memory

Most AI memory solutions — RAG, vector stores, simple key-value caches — dump everything into the same bucket. A one-time debug note sits next to a critical architectural decision with the same priority, the same retrieval weight, the same lifespan.

The result? Bloated context windows full of irrelevant noise. Your AI retrieves a bug fix from 6 months ago with the same confidence as a pattern you use daily.

Cortex: Cognitive Classification for AI Memory

Titan Memory includes Cortex — a multi-stage classifier that routes every incoming memory into one of five cognitive categories:

Category What It Stores Decay Rate
Knowledge Facts, definitions, technical info Slow — facts persist
Profile Preferences, settings, user context Very slow — preferences stick
Event Sessions, deployments, incidents Fast — events age out
Behavior Patterns, habits, workflows Slow — patterns are valuable
Skill Techniques, solutions, best practices Very slow — skills are durable

Each category decays at a different rate. An error you hit last Tuesday fades. A deployment pattern you've used across 5 projects persists.

The Librarian Pipeline

On recall, Cortex doesn't just return the top-K vectors. It runs a full refinement pipeline:

  1. Retrieve top candidates via hybrid search (dense vectors + BM25)
  2. Split into individual sentences
  3. Score every sentence with a 0.6B parameter semantic encoder
  4. Prune anything below relevance threshold
  5. Resolve temporal conflicts (newer info wins)
  6. Check category coverage — balanced recall, not just highest embeddings

The result: 70-80% token compression on every recall. Only gold sentences reach your LLM.

How It Actually Works

# One command to install
claude mcp add titan-memory -- node ~/.claude/titan-memory/bin/titan-mcp.js
Enter fullscreen mode Exit fullscreen mode

Store a memory:

titan_add("Always use connection pooling for high-traffic Postgres services")
→ Classified: Skill (confidence: 0.94)
→ Routed to Layer 4 (Semantic Memory)
→ Decay half-life: 270 days
Enter fullscreen mode Exit fullscreen mode

Store an event:

titan_add("Deployed v2.3 to production, rolled back due to memory leak")
→ Classified: Event (confidence: 0.91)
→ Routed to Layer 5 (Episodic Memory)
→ Decay half-life: 90 days
Enter fullscreen mode Exit fullscreen mode

Recall later:

titan_recall("Postgres performance best practices")
→ Returns the connection pooling skill (still strong after 6 months)
→ The deployment event has decayed — unless you specifically ask for events
Enter fullscreen mode Exit fullscreen mode

That's how human memory works. Different types of information, stored differently, retrieved differently, forgotten at different rates. We just gave that to AI.

The Bigger Picture

Titan Memory is a 5-layer cognitive memory system delivered as an MCP server:

  • Layer 1: Working Memory (your context window)
  • Layer 2: Factual Memory (O(1) hash lookup, sub-10ms)
  • Layer 3: Long-Term Memory (surprise-filtered, adaptive decay)
  • Layer 4: Semantic Memory (patterns, reasoning chains)
  • Layer 5: Episodic Memory (session logs, timestamps)

Cortex is just one piece. There's also semantic highlighting, surprise-based storage filtering, hybrid search with RRF reranking, and cross-project pattern transfer.

914 passing tests. Works with Claude Code, Cursor, or any MCP-compatible client.

Built With Less

I definitely can't contend for compute like the rest of the 99.9%. But we can all strive for sustainability and AI safety.

This system was coded entirely by Opus 4.5, and the research was done with Opus 4.5 and Google's DeepMind in a Queen swarm pattern. All the architectural decisions were my own, and all the countless hours of researching and reading and staying awake for far too many hours at a time were all on my own.

This project shows that you don't always have to build bigger or be bigger to get the best outcome. This is evidence that you can get a lot out of a little compute and solve countless problems.

Now go build something great.


100% FREE, no paywall, all the sauce in one bottle.

GitHub: github.com/TC407-api/titan-memory

License: Apache 2.0

Top comments (0)