DEV Community

Cover image for I replaced my claude.md with a 3-layer cognitive memory system. Here's the architecture.
Fran
Fran

Posted on

I replaced my claude.md with a 3-layer cognitive memory system. Here's the architecture.

I built a structured memory system for AI called Alma. This post explains the architecture, not the marketing.

The problem, technically

Current AI memory implementations (claude.md, .cursorrules, ChatGPT Memory) share these limitations:

  1. No schema. All data is unstructured text. No types, no fields, no queryable metadata.
  2. No weighting. Every piece of information has equal priority in the context window.
  3. No automatic extraction. The user manually maintains the memory.
  4. No deduplication. Similar information accumulates without merging.
  5. No separation of concerns. Identity, style preferences, and session context are mixed.

The architecture

Alma has three data layers and an assembly engine:

┌─────────────────────────────────────────────┐
│                Context Assembler             │
│  (dynamic token budget, relevance scoring)   │
├──────────┬──────────┬──────────┬────────────┤
│ Soul     │ Memories │ Episodes │ Procedures │
│ Engine   │          │          │            │
│ 13 blocks│ Weighted │ Summaries│ Behavioral │
│ Identity │ facts    │ w/ topics│ patterns   │
│ Style    │ w/ score │ outcomes │ auto-       │
│ Context  │ category │ search   │ extracted  │
└──────────┴──────────┴──────────┴────────────┘
         ↑ Background Processor ↑
         (async, every N messages)
Enter fullscreen mode Exit fullscreen mode

Layer 1: Memories

Schema:

interface Memory {
  id: string;
  content: string;
  category: 'preference' | 'fact' | 'decision' | 'project' | 'general';
  importance: number;     // 0-1, determines context priority
  source: 'manual' | 'extracted' | 'extension' | 'api' | 'consolidated';
  access_count: number;   // incremented on retrieval
  reinforcement_count: number; // incremented on dedup match
  embedding: Float32Array;     // for semantic search
  created_at: string;
  last_accessed_at: string;
}
Enter fullscreen mode Exit fullscreen mode

Deduplication uses Jaccard similarity on keyword sets with a 60% threshold and 3-keyword minimum. Above threshold: reinforce existing memory (increment count) instead of creating new record.

Search is hybrid: keyword (SQL FTS5) + semantic (cosine similarity on Cloudflare Vectorize embeddings). Results merged and re-ranked by a weighted score:

const WEIGHTS = {
  relevance: 0.40,   // Cosine similarity to current query
  importance: 0.30,  // 0.0-1.0, extracted or user-assigned
  recency: 0.20,     // Exponential decay, 7-day half-life
  frequency: 0.10,   // Logarithmic scale of access count
};
Enter fullscreen mode Exit fullscreen mode

Layer 2: Episodes

interface Episode {
  id: string;
  conversation_id: string;
  summary: string;
  topics: string[];
  outcome: string;
  message_count: number;
  embedding: Float32Array;
}
Enter fullscreen mode Exit fullscreen mode

Auto-generated at conversation end. Searchable by topic, outcome, or semantic similarity.

Layer 3: Procedures

interface Procedure {
  id: string;
  content: string;        // "Checks error handling first in code reviews"
  category: string;
  trigger: string;        // When this pattern activates
  source: 'extracted' | 'manual';
}
Enter fullscreen mode Exit fullscreen mode

Extracted by the background processor analyzing conversation patterns. These represent behavioral habits, not explicit preferences.

Soul Engine: 13 blocks

type SoulSection = 'identity' | 'style' | 'context';
type BlockKey =
  | 'identity' | 'worldview' | 'tensions' | 'rules'
  | 'style_guide' | 'anti_patterns' | 'communication' | 'examples'
  | 'user_profile' | 'active_context' | 'learned_patterns'
  | 'scratchpad' | 'custom';

interface SoulBlock {
  key: BlockKey;
  section: SoulSection;
  content: string;
  char_limit: number;
  priority: number;
  truncation: 'head' | 'tail';  // head = keep newest, tail = keep oldest
}
Enter fullscreen mode Exit fullscreen mode

Identity blocks use tail truncation (preserve oldest = core values stable). Context blocks use head truncation (trim oldest = keep fresh data). This simple mechanism creates different temporal behaviors without complex logic.

Context Assembler

async function assembleContext(userId: string, message: string): Promise<string> {
  // 1. Soul Engine — always included, highest priority
  const soul = await renderSoulBlocks(userId);

  // 2. Relevant memories — scored by semantic similarity to current message
  const memories = await searchMemories(userId, message, { mode: 'hybrid' });

  // 3. Recent episodes — for conversation continuity
  const episodes = await getRecentEpisodes(userId);

  // 4. Matching procedures — behavioral patterns
  const procedures = await matchProcedures(userId, message);

  // 5. Dynamic token budget — sections compete for space
  return buildPrompt({ soul, memories, episodes, procedures }, TOKEN_BUDGET);
}
Enter fullscreen mode Exit fullscreen mode

Each section has a priority. If total tokens exceed the budget, lower-priority sections get truncated first. The Soul Engine is always preserved in full.

Background Processor

Fires asynchronously via ctx.waitUntil() every N messages:

  1. Sends recent conversation to Claude Haiku for analysis
  2. Receives structured JSON with extracted memories, episodes, procedures
  3. Deduplicates memories against existing store
  4. Updates relevant soul blocks (active_context, learned_patterns, user_profile)
  5. Stores episode summary

Zero impact on conversation latency.

Infrastructure

Entirely Cloudflare:

  • Workers — API, SSE streaming, background processing
  • D1 — SQLite database (56 migrations)
  • Vectorize — Embedding storage and similarity search
  • R2 — File uploads (images, documents)
  • KV — Configuration cache
  • Durable Objects — Atomic budget tracking (single-threaded counters)

No AWS. No external databases. Cold start under 5ms.

Numbers

  • 1,690 passing tests across 102 files
  • 56 database migrations
  • 180 REST API endpoints
  • 15 fully localized languages
  • 6 agent tools in chat + 21 MCP tools + 9 MCP resources

Try it

Web app: alma.olivares.ai

Free tier: 500 memories, Claude Haiku, automatic learning. No credit card.


Built by Francisco @ Olivares.AI

Top comments (0)