Fran

Posted on Mar 14

I built a memory system for AI — here's the architecture

#ai #webdev #productivity #devops

If you use Claude Code or Claude Projects with a well-written CLAUDE.md, you already know the difference it makes. The AI knows your stack, your conventions, your project structure. It's genuinely great.

But CLAUDE.md is static. You write it once, you maintain it manually, and it lives in one project. What about your preferences across projects? What about decisions you made three weeks ago? What about the patterns the AI could learn from watching how you work — if it had somewhere to store them?

That's the gap I wanted to close. So I built Alma — a cognitive memory system that gives AI persistent, structured memory that grows over time.

Here's how it works under the hood.

The architecture: 3 layers of memory

Alma organizes memory in three distinct layers, each serving a different purpose.

Memories — what the AI knows about you

Structured facts. Each one is semantically indexed, categorized, scored by importance, and tagged with confidence:

"Prefers TypeScript over JavaScript"          [confidence: 1.0, category: preference]
"Project uses D1 with Drizzle ORM"            [confidence: 1.0, category: technical]
"Hates verbose explanations — get to the point" [confidence: 0.8, category: preference]
"Decided on event-driven architecture March 3" [confidence: 1.0, category: decision]

When you say "review my auth middleware", Alma's Context Assembler runs a hybrid search (keyword + semantic) across your memories. It pulls the ones relevant to auth, to your stack, to your coding preferences — and injects them into the system prompt before the LLM even sees your message.

The result: the AI already has context before you type your first message.

Episodes — what happened before

After each conversation, a background processor generates a structured summary:

Episode: "Auth Middleware Refactor"
  Summary: Rewrote JWT validation to use jose library.
           Added refresh token rotation. Decided against
           session cookies for API-first architecture.
  Topics: auth, security, middleware
  Outcome: PR merged, deployed to staging

When you say "remember that auth discussion?", the AI recalls the full episode — decisions, outcomes, context. Structured summaries, not raw transcript fragments.

Procedures — how you like to work

Procedures are behavioral patterns the AI learns from observing your interactions:

"When reviewing code → check error handling first, then types"
"When explaining → use bullet points, not paragraphs"
"When debugging → ask for the error message before suggesting fixes"

These aren't stored and forgotten. They're matched against context on every conversation and applied dynamically. After a few weeks, the AI starts anticipating how you want things done — without you ever explicitly configuring it.

The Soul Engine: identity, not a system prompt

The Soul Engine goes beyond a single system prompt. It's 12 structured blocks organized in three sections:

SOUL — who the AI is:

Identity: core character, name, role
Worldview: how it approaches problems
Rules: non-negotiable behaviors (never fabricate memories, acknowledge uncertainty)
Tensions: the paradoxes that make personality feel real ("technical but warm", "concise but thorough when it matters")

STYLE — how it communicates:

Style Guide: voice, vocabulary, structure
Anti-Patterns: things to never do ("never say 'As an AI language model'")
Communication Modes: different modes for different situations (teaching, debugging, creative)
Example Interactions: calibration by demonstration

CONTEXT — what it knows right now:

User Profile, Active Context, Learned Patterns, Scratchpad
Plus custom blocks you define yourself

Every conversation, the Context Assembler renders this into a structured XML system prompt:

<alma_soul>
  <identity>You are Alma. Direct, technical, warm...</identity>
  <worldview>Simplicity over cleverness. Working code over elegant abstractions.</worldview>
  <tensions>Technical but approachable. Opinionated but open to correction.</tensions>
  <rules>Always reference relevant memories. Never fabricate information.</rules>
</alma_soul>

<alma_context>
  <user_profile>Senior dev, TypeScript, Hono + D1 stack...</user_profile>
  <active_context>Working on auth middleware refactor...</active_context>
  <memories>
    [12 most relevant memories for this conversation, ranked by semantic score]
  </memories>
  <episodes>
    [3 recent relevant episodes with summaries and outcomes]
  </episodes>
  <procedures>
    [Matched behavioral patterns for code review context]
  </procedures>
</alma_context>

Priority order: soul blocks first (always), then memories, then episodes, then procedures — all fit within a token budget. The AI gets a complete picture of who you are and what's happening, every single time.

It learns while you chat

Memory extraction runs in the background. You never wait for it.

After a conversation, a background processor (cheapest LLM, fire-and-forget with ctx.waitUntil()) analyzes the exchange and:

Extracts new memories
Generates episode summaries
Updates your user profile and active context
Refines procedures from observed patterns

You just chat. The AI gets quietly better after every interaction.

Developer-first: everything is an API

REST API — 140+ endpoints

Full CRUD on everything. Memories, episodes, procedures, blocks, conversations, chat (SSE streaming), files, images, voice, teams.

# Assemble full context for any message
curl -X POST https://alma.olivares.ai/api/v1/context/assemble \
  -H "X-API-Key: alma_key_..." \
  -d '{"user_message": "Review the auth middleware"}'

# Returns: structured system prompt + metadata (token counts, memory scores, keywords)

MCP Server — 21 tools for Claude Desktop, Cursor, Windsurf

{
  "mcpServers": {
    "alma": {
      "command": "npx",
      "args": ["-y", "@olivaresai/alma-mcp"],
      "env": { "ALMA_API_KEY": "alma_key_..." }
    }
  }
}

Your AI gets native tools: alma_search, alma_remember, alma_recall, alma_assemble, alma_focus, alma_update_block — it reads and writes to its own memory as part of reasoning.

JavaScript SDK

npm install @olivaresai/alma-sdk

import { Alma } from '@olivaresai/alma-sdk';

const alma = new Alma({ apiKey: 'alma_key_...' });
const context = await alma.context.assemble({ message: 'Review the auth middleware' });
// → Full system prompt with soul, memories, episodes, procedures

VSCode Extension

Memory search from the command palette. Context injection. Chat with persistent memory without leaving your editor.

Voice, images, documents — same memory

Voice Chat: Deepgram Nova-2 (transcription) + ElevenLabs (synthesis). Talk to your AI by voice — same persistent memory as text.
Image Studio: Flux Pro + Leonardo AI. The AI remembers your style preferences and past generations.
Document Generation: Export conversations to PDF, DOCX, XLSX, PPTX.

Every modality shares the same memory layer. A voice conversation references decisions from a text chat two weeks ago.

3 models, your choice

Powered exclusively by Anthropic Claude:

Tier	Model	Use case
Normal	Claude Haiku	Quick tasks, everyday
Advanced	Claude Sonnet	Professional work, complex analysis
Complex	Claude Opus	Deep reasoning, nuanced problems

Free plan gets Haiku. Paid plans get all three. Switch anytime — memory carries over.

BYOK: On Advanced+ plans, bring your own Anthropic, Replicate, or Leonardo API keys. Queries go direct to your accounts.

Privacy

Your memories, episodes, procedures, and identity blocks are the most personal data an AI can hold. Alma's position:

You own everything. Full .alma portable export. GDPR compliant (Articles 15-22).
Never used for training. Zero tracking. Zero analytics.
Account deletion permanently purges databases, R2 storage, and Stripe records. No retention.
Encrypted at rest and in transit. API keys hashed and never exposed after creation.

Pricing

Plan	Price	Highlights
Free	$0 forever	500 memories, 50 episodes, Claude Haiku
Pro	$19/mo	10K memories, 3 AI tiers, voice, images
Advanced	$49/mo	50K memories, API + MCP access, BYOK
Ultimate	$149/mo	Unlimited everything, dedicated support
Ultimate Max	$249/mo	2x weekly AI budget, maximum capacity

Weekly AI budget resets each Monday. Credit packs ($14.99 / $39.99 / $89.99) never expire.

The stack

If you're curious: the entire system runs on Cloudflare Workers (D1 for SQL, Vectorize for embeddings, R2 for files), Hono for the API framework, React for the frontend, and Anthropic Claude for all AI inference. 56 database migrations, ~1,600 tests passing. Solo developer.

Try it

Platform	Link
Web App	alma.olivares.ai — free, no credit card
MCP Server	@olivaresai/alma-mcp
VSCode	VS Code Marketplace
JS SDK	@olivaresai/alma-sdk
REST API	Developer Docs — 140+ endpoints
Docs	olivares.ai/docs

The free tier has 500 memories and no time limit. If you've ever been frustrated by an AI that forgets everything, give it a few conversations. The difference is immediate.

What would you want an AI that actually remembers you to do? I'd genuinely like to know.

*OlivaresAI.