DEV Community

Fran
Fran

Posted on

I built a memory system for AI — here's the architecture

If you use Claude Code or Claude Projects with a well-written CLAUDE.md, you already know the difference it makes. The AI knows your stack, your conventions, your project structure. It's genuinely great.

But CLAUDE.md is static. You write it once, you maintain it manually, and it lives in one project. What about your preferences across projects? What about decisions you made three weeks ago? What about the patterns the AI could learn from watching how you work — if it had somewhere to store them?

That's the gap I wanted to close. So I built Alma — a cognitive memory system that gives AI persistent, structured memory that grows over time.

Here's how it works under the hood.

The architecture: 3 layers of memory

Alma organizes memory in three distinct layers, each serving a different purpose.

Memories — what the AI knows about you

Structured facts. Each one is semantically indexed, categorized, scored by importance, and tagged with confidence:

"Prefers TypeScript over JavaScript"          [confidence: 1.0, category: preference]
"Project uses D1 with Drizzle ORM"            [confidence: 1.0, category: technical]
"Hates verbose explanations — get to the point" [confidence: 0.8, category: preference]
"Decided on event-driven architecture March 3" [confidence: 1.0, category: decision]
Enter fullscreen mode Exit fullscreen mode

When you say "review my auth middleware", Alma's Context Assembler runs a hybrid search (keyword + semantic) across your memories. It pulls the ones relevant to auth, to your stack, to your coding preferences — and injects them into the system prompt before the LLM even sees your message.

The result: the AI already has context before you type your first message.

Episodes — what happened before

After each conversation, a background processor generates a structured summary:

Episode: "Auth Middleware Refactor"
  Summary: Rewrote JWT validation to use jose library.
           Added refresh token rotation. Decided against
           session cookies for API-first architecture.
  Topics: auth, security, middleware
  Outcome: PR merged, deployed to staging
Enter fullscreen mode Exit fullscreen mode

When you say "remember that auth discussion?", the AI recalls the full episode — decisions, outcomes, context. Structured summaries, not raw transcript fragments.

Procedures — how you like to work

Procedures are behavioral patterns the AI learns from observing your interactions:

"When reviewing code → check error handling first, then types"
"When explaining → use bullet points, not paragraphs"
"When debugging → ask for the error message before suggesting fixes"
Enter fullscreen mode Exit fullscreen mode

These aren't stored and forgotten. They're matched against context on every conversation and applied dynamically. After a few weeks, the AI starts anticipating how you want things done — without you ever explicitly configuring it.

The Soul Engine: identity, not a system prompt

The Soul Engine goes beyond a single system prompt. It's 12 structured blocks organized in three sections:

SOUL — who the AI is:

  • Identity: core character, name, role
  • Worldview: how it approaches problems
  • Rules: non-negotiable behaviors (never fabricate memories, acknowledge uncertainty)
  • Tensions: the paradoxes that make personality feel real ("technical but warm", "concise but thorough when it matters")

STYLE — how it communicates:

  • Style Guide: voice, vocabulary, structure
  • Anti-Patterns: things to never do ("never say 'As an AI language model'")
  • Communication Modes: different modes for different situations (teaching, debugging, creative)
  • Example Interactions: calibration by demonstration

CONTEXT — what it knows right now:

  • User Profile, Active Context, Learned Patterns, Scratchpad
  • Plus custom blocks you define yourself

Every conversation, the Context Assembler renders this into a structured XML system prompt:

<alma_soul>
  <identity>You are Alma. Direct, technical, warm...</identity>
  <worldview>Simplicity over cleverness. Working code over elegant abstractions.</worldview>
  <tensions>Technical but approachable. Opinionated but open to correction.</tensions>
  <rules>Always reference relevant memories. Never fabricate information.</rules>
</alma_soul>

<alma_context>
  <user_profile>Senior dev, TypeScript, Hono + D1 stack...</user_profile>
  <active_context>Working on auth middleware refactor...</active_context>
  <memories>
    [12 most relevant memories for this conversation, ranked by semantic score]
  </memories>
  <episodes>
    [3 recent relevant episodes with summaries and outcomes]
  </episodes>
  <procedures>
    [Matched behavioral patterns for code review context]
  </procedures>
</alma_context>
Enter fullscreen mode Exit fullscreen mode

Priority order: soul blocks first (always), then memories, then episodes, then procedures — all fit within a token budget. The AI gets a complete picture of who you are and what's happening, every single time.

It learns while you chat

Memory extraction runs in the background. You never wait for it.

After a conversation, a background processor (cheapest LLM, fire-and-forget with ctx.waitUntil()) analyzes the exchange and:

  • Extracts new memories
  • Generates episode summaries
  • Updates your user profile and active context
  • Refines procedures from observed patterns

You just chat. The AI gets quietly better after every interaction.

Developer-first: everything is an API

REST API — 140+ endpoints

Full CRUD on everything. Memories, episodes, procedures, blocks, conversations, chat (SSE streaming), files, images, voice, teams.

# Assemble full context for any message
curl -X POST https://alma.olivares.ai/api/v1/context/assemble \
  -H "X-API-Key: alma_key_..." \
  -d '{"user_message": "Review the auth middleware"}'

# Returns: structured system prompt + metadata (token counts, memory scores, keywords)
Enter fullscreen mode Exit fullscreen mode

MCP Server — 21 tools for Claude Desktop, Cursor, Windsurf

{
  "mcpServers": {
    "alma": {
      "command": "npx",
      "args": ["-y", "@olivaresai/alma-mcp"],
      "env": { "ALMA_API_KEY": "alma_key_..." }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Your AI gets native tools: alma_search, alma_remember, alma_recall, alma_assemble, alma_focus, alma_update_block — it reads and writes to its own memory as part of reasoning.

JavaScript SDK

npm install @olivaresai/alma-sdk
Enter fullscreen mode Exit fullscreen mode
import { Alma } from '@olivaresai/alma-sdk';

const alma = new Alma({ apiKey: 'alma_key_...' });
const context = await alma.context.assemble({ message: 'Review the auth middleware' });
// → Full system prompt with soul, memories, episodes, procedures
Enter fullscreen mode Exit fullscreen mode

VSCode Extension

Memory search from the command palette. Context injection. Chat with persistent memory without leaving your editor.

Voice, images, documents — same memory

  • Voice Chat: Deepgram Nova-2 (transcription) + ElevenLabs (synthesis). Talk to your AI by voice — same persistent memory as text.
  • Image Studio: Flux Pro + Leonardo AI. The AI remembers your style preferences and past generations.
  • Document Generation: Export conversations to PDF, DOCX, XLSX, PPTX.

Every modality shares the same memory layer. A voice conversation references decisions from a text chat two weeks ago.

3 models, your choice

Powered exclusively by Anthropic Claude:

Tier Model Use case
Normal Claude Haiku Quick tasks, everyday
Advanced Claude Sonnet Professional work, complex analysis
Complex Claude Opus Deep reasoning, nuanced problems

Free plan gets Haiku. Paid plans get all three. Switch anytime — memory carries over.

BYOK: On Advanced+ plans, bring your own Anthropic, Replicate, or Leonardo API keys. Queries go direct to your accounts.

Privacy

Your memories, episodes, procedures, and identity blocks are the most personal data an AI can hold. Alma's position:

  • You own everything. Full .alma portable export. GDPR compliant (Articles 15-22).
  • Never used for training. Zero tracking. Zero analytics.
  • Account deletion permanently purges databases, R2 storage, and Stripe records. No retention.
  • Encrypted at rest and in transit. API keys hashed and never exposed after creation.

Pricing

Plan Price Highlights
Free $0 forever 500 memories, 50 episodes, Claude Haiku
Pro $19/mo 10K memories, 3 AI tiers, voice, images
Advanced $49/mo 50K memories, API + MCP access, BYOK
Ultimate $149/mo Unlimited everything, dedicated support
Ultimate Max $249/mo 2x weekly AI budget, maximum capacity

Weekly AI budget resets each Monday. Credit packs ($14.99 / $39.99 / $89.99) never expire.

The stack

If you're curious: the entire system runs on Cloudflare Workers (D1 for SQL, Vectorize for embeddings, R2 for files), Hono for the API framework, React for the frontend, and Anthropic Claude for all AI inference. 56 database migrations, ~1,600 tests passing. Solo developer.

Try it

Platform Link
Web App alma.olivares.ai — free, no credit card
MCP Server @olivaresai/alma-mcp
VSCode VS Code Marketplace
JS SDK @olivaresai/alma-sdk
REST API Developer Docs — 140+ endpoints
Docs olivares.ai/docs

The free tier has 500 memories and no time limit. If you've ever been frustrated by an AI that forgets everything, give it a few conversations. The difference is immediate.

What would you want an AI that actually remembers you to do? I'd genuinely like to know.


*OlivaresAI.

Top comments (0)