If you use Claude Code or Claude Projects with a well-written CLAUDE.md, you already know the difference it makes. The AI knows your stack, your conventions, your project structure. It's genuinely great.
But CLAUDE.md is static. You write it once, you maintain it manually, and it lives in one project. What about your preferences across projects? What about decisions you made three weeks ago? What about the patterns the AI could learn from watching how you work — if it had somewhere to store them?
That's the gap I wanted to close. So I built Alma — a cognitive memory system that gives AI persistent, structured memory that grows over time.
Here's how it works under the hood.
The architecture: 3 layers of memory
Alma organizes memory in three distinct layers, each serving a different purpose.
Memories — what the AI knows about you
Structured facts. Each one is semantically indexed, categorized, scored by importance, and tagged with confidence:
"Prefers TypeScript over JavaScript" [confidence: 1.0, category: preference]
"Project uses D1 with Drizzle ORM" [confidence: 1.0, category: technical]
"Hates verbose explanations — get to the point" [confidence: 0.8, category: preference]
"Decided on event-driven architecture March 3" [confidence: 1.0, category: decision]
When you say "review my auth middleware", Alma's Context Assembler runs a hybrid search (keyword + semantic) across your memories. It pulls the ones relevant to auth, to your stack, to your coding preferences — and injects them into the system prompt before the LLM even sees your message.
The result: the AI already has context before you type your first message.
Episodes — what happened before
After each conversation, a background processor generates a structured summary:
Episode: "Auth Middleware Refactor"
Summary: Rewrote JWT validation to use jose library.
Added refresh token rotation. Decided against
session cookies for API-first architecture.
Topics: auth, security, middleware
Outcome: PR merged, deployed to staging
When you say "remember that auth discussion?", the AI recalls the full episode — decisions, outcomes, context. Structured summaries, not raw transcript fragments.
Procedures — how you like to work
Procedures are behavioral patterns the AI learns from observing your interactions:
"When reviewing code → check error handling first, then types"
"When explaining → use bullet points, not paragraphs"
"When debugging → ask for the error message before suggesting fixes"
These aren't stored and forgotten. They're matched against context on every conversation and applied dynamically. After a few weeks, the AI starts anticipating how you want things done — without you ever explicitly configuring it.
The Soul Engine: identity, not a system prompt
The Soul Engine goes beyond a single system prompt. It's 12 structured blocks organized in three sections:
SOUL — who the AI is:
- Identity: core character, name, role
- Worldview: how it approaches problems
- Rules: non-negotiable behaviors (never fabricate memories, acknowledge uncertainty)
- Tensions: the paradoxes that make personality feel real ("technical but warm", "concise but thorough when it matters")
STYLE — how it communicates:
- Style Guide: voice, vocabulary, structure
- Anti-Patterns: things to never do ("never say 'As an AI language model'")
- Communication Modes: different modes for different situations (teaching, debugging, creative)
- Example Interactions: calibration by demonstration
CONTEXT — what it knows right now:
- User Profile, Active Context, Learned Patterns, Scratchpad
- Plus custom blocks you define yourself
Every conversation, the Context Assembler renders this into a structured XML system prompt:
<alma_soul>
<identity>You are Alma. Direct, technical, warm...</identity>
<worldview>Simplicity over cleverness. Working code over elegant abstractions.</worldview>
<tensions>Technical but approachable. Opinionated but open to correction.</tensions>
<rules>Always reference relevant memories. Never fabricate information.</rules>
</alma_soul>
<alma_context>
<user_profile>Senior dev, TypeScript, Hono + D1 stack...</user_profile>
<active_context>Working on auth middleware refactor...</active_context>
<memories>
[12 most relevant memories for this conversation, ranked by semantic score]
</memories>
<episodes>
[3 recent relevant episodes with summaries and outcomes]
</episodes>
<procedures>
[Matched behavioral patterns for code review context]
</procedures>
</alma_context>
Priority order: soul blocks first (always), then memories, then episodes, then procedures — all fit within a token budget. The AI gets a complete picture of who you are and what's happening, every single time.
It learns while you chat
Memory extraction runs in the background. You never wait for it.
After a conversation, a background processor (cheapest LLM, fire-and-forget with ctx.waitUntil()) analyzes the exchange and:
- Extracts new memories
- Generates episode summaries
- Updates your user profile and active context
- Refines procedures from observed patterns
You just chat. The AI gets quietly better after every interaction.
Developer-first: everything is an API
REST API — 140+ endpoints
Full CRUD on everything. Memories, episodes, procedures, blocks, conversations, chat (SSE streaming), files, images, voice, teams.
# Assemble full context for any message
curl -X POST https://alma.olivares.ai/api/v1/context/assemble \
-H "X-API-Key: alma_key_..." \
-d '{"user_message": "Review the auth middleware"}'
# Returns: structured system prompt + metadata (token counts, memory scores, keywords)
MCP Server — 21 tools for Claude Desktop, Cursor, Windsurf
{
"mcpServers": {
"alma": {
"command": "npx",
"args": ["-y", "@olivaresai/alma-mcp"],
"env": { "ALMA_API_KEY": "alma_key_..." }
}
}
}
Your AI gets native tools: alma_search, alma_remember, alma_recall, alma_assemble, alma_focus, alma_update_block — it reads and writes to its own memory as part of reasoning.
JavaScript SDK
npm install @olivaresai/alma-sdk
import { Alma } from '@olivaresai/alma-sdk';
const alma = new Alma({ apiKey: 'alma_key_...' });
const context = await alma.context.assemble({ message: 'Review the auth middleware' });
// → Full system prompt with soul, memories, episodes, procedures
VSCode Extension
Memory search from the command palette. Context injection. Chat with persistent memory without leaving your editor.
Voice, images, documents — same memory
- Voice Chat: Deepgram Nova-2 (transcription) + ElevenLabs (synthesis). Talk to your AI by voice — same persistent memory as text.
- Image Studio: Flux Pro + Leonardo AI. The AI remembers your style preferences and past generations.
- Document Generation: Export conversations to PDF, DOCX, XLSX, PPTX.
Every modality shares the same memory layer. A voice conversation references decisions from a text chat two weeks ago.
3 models, your choice
Powered exclusively by Anthropic Claude:
| Tier | Model | Use case |
|---|---|---|
| Normal | Claude Haiku | Quick tasks, everyday |
| Advanced | Claude Sonnet | Professional work, complex analysis |
| Complex | Claude Opus | Deep reasoning, nuanced problems |
Free plan gets Haiku. Paid plans get all three. Switch anytime — memory carries over.
BYOK: On Advanced+ plans, bring your own Anthropic, Replicate, or Leonardo API keys. Queries go direct to your accounts.
Privacy
Your memories, episodes, procedures, and identity blocks are the most personal data an AI can hold. Alma's position:
- You own everything. Full
.almaportable export. GDPR compliant (Articles 15-22). - Never used for training. Zero tracking. Zero analytics.
- Account deletion permanently purges databases, R2 storage, and Stripe records. No retention.
- Encrypted at rest and in transit. API keys hashed and never exposed after creation.
Pricing
| Plan | Price | Highlights |
|---|---|---|
| Free | $0 forever | 500 memories, 50 episodes, Claude Haiku |
| Pro | $19/mo | 10K memories, 3 AI tiers, voice, images |
| Advanced | $49/mo | 50K memories, API + MCP access, BYOK |
| Ultimate | $149/mo | Unlimited everything, dedicated support |
| Ultimate Max | $249/mo | 2x weekly AI budget, maximum capacity |
Weekly AI budget resets each Monday. Credit packs ($14.99 / $39.99 / $89.99) never expire.
The stack
If you're curious: the entire system runs on Cloudflare Workers (D1 for SQL, Vectorize for embeddings, R2 for files), Hono for the API framework, React for the frontend, and Anthropic Claude for all AI inference. 56 database migrations, ~1,600 tests passing. Solo developer.
Try it
| Platform | Link |
|---|---|
| Web App | alma.olivares.ai — free, no credit card |
| MCP Server | @olivaresai/alma-mcp |
| VSCode | VS Code Marketplace |
| JS SDK | @olivaresai/alma-sdk |
| REST API | Developer Docs — 140+ endpoints |
| Docs | olivares.ai/docs |
The free tier has 500 memories and no time limit. If you've ever been frustrated by an AI that forgets everything, give it a few conversations. The difference is immediate.
What would you want an AI that actually remembers you to do? I'd genuinely like to know.

Top comments (0)