DEV Community: Sri

How I Built a Persistent Memory Layer for AI Coding Tools

Sri — Wed, 08 Apr 2026 21:49:18 +0000

If you use AI coding assistants daily, you have felt this pain. You open a new session with Claude Code, Cursor, or Copilot, and you spend the first twenty minutes re-explaining your project structure, your preferences, the bug you fixed yesterday, the architectural decisions you made last week. The AI has no idea. Every session starts from absolute zero.

I started measuring this. In my own workflow, I was burning 20-25 minutes per session on context restoration alone. That is not the worst part. MCP servers — the tools that extend these AI assistants — consume tokens just by loading. I have watched 67,000 tokens disappear before I even typed my first prompt. That is roughly half the context window on most models, gone before any actual work begins.

Context fills up. The conversation dies. You start a new one. The cycle repeats.

Now multiply this across a team. Five developers, each running four AI sessions per day, each losing twenty minutes to context re-establishment. That is nearly seven hours of developer time evaporating every single day. Over a month, that is 140 hours — three and a half work weeks — spent telling AI tools things they already knew yesterday.

The problem is not that these tools are unintelligent. The problem is architectural. Each AI session is stateless by design. There is no persistence layer. There is no memory. Your Claude Code session and your Cursor session are completely isolated. A fact stored in one never reaches the other.

I wanted to fix this. Not by building yet another AI wrapper or a fancy prompt template, but by adding a proper memory layer that sits beneath every tool and persists across all of them. That project became Smara.

Why MCP Was the Right Protocol

When I started building Smara, I had three options for integrating with AI coding tools: fork each tool and patch memory in, build custom plugins for each platform, or find a universal protocol.

The Model Context Protocol (MCP) made the choice obvious. MCP is an open standard — originally created by Anthropic, now adopted by OpenAI, Google, and dozens of other tool vendors — that lets you extend any compatible AI tool without touching its source code. You write a server that exposes tools, and any MCP-compatible client can call them.

The key insight is that MCP's tool-calling model maps naturally to memory operations. An AI assistant already knows how to call tools. It already reasons about when to use grep versus read_file. Adding smara_store and smara_search to its toolkit requires zero behavioral changes. The AI decides when to store a memory and when to recall one, the same way it decides when to run a shell command.

Custom plugins would have locked me into a single platform. API wrappers would have required explicit integration code in every tool. MCP gave me write-once, run-everywhere. One server, one npm install, and suddenly Claude Code, Cursor, Windsurf, and any future MCP-compatible tool all share the same memory pool.

The ecosystem validated this bet. MCP SDK downloads have crossed 97 million per month. There are over 10,000 MCP servers in the wild. When OpenAI and Google both adopted the protocol within months of Anthropic publishing it, it was clear that MCP was not a niche experiment — it was becoming the standard interface between AI tools and the outside world.

Building on MCP also meant I could focus entirely on what mattered: the memory engine itself. No platform-specific glue code. No maintaining five different plugin formats. Just one clean protocol.

Designing the 7 Memory Tools

Early prototypes of Smara had two tools: store and search. It worked, technically. But the AI made poor decisions about when to use them. A blunt interface produces blunt behavior.

The production MCP server exposes seven tools, each with a specific semantic purpose:

smara_store — Save a new fact to memory with importance scoring
smara_search — Semantic search across stored memories
smara_recall — Load context at conversation start (top memories by decay score)
smara_forget — Explicitly remove a memory that is outdated or wrong
smara_list — Browse memories with filtering (by source, namespace, date)
smara_tag — Organize memories with labels for better retrieval
smara_relate — Create explicit connections between related memories

Why seven instead of two? Granularity improves AI decision-making. When the AI sees smara_forget, it understands it can correct mistakes. When it sees smara_relate, it can link a debugging session to an architectural decision. These are not just CRUD operations — they are cognitive primitives that map to how an intelligent agent should interact with memory.

Adding Smara to your AI tool takes one config block. Here is the MCP configuration for Claude Code:

{
  "smara": {
    "command": "npx",
    "args": ["-y", "@smara/mcp-server"],
    "env": {
      "SMARA_API_KEY": "smara_your_key_here"
    }
  }
}

Drop that into ~/.claude/mcp_config.json (or .cursor/mcp.json for Cursor), restart, and memory is live. Context loads automatically at conversation start. New facts are stored silently during normal work. The AI handles the memory management itself — you do not need to think about it.

Ebbinghaus Decay Scoring

Most memory systems rank results by recency or by vector similarity. Recency is naive — a two-week-old architectural decision matters more than this morning's typo fix. Pure similarity misses temporal context — it cannot distinguish between a current fact and an obsolete one.

Smara uses Ebbinghaus forgetting curve decay, modeled after how human memory actually works. Every memory's relevance decays exponentially over time, modulated by its importance:

export function ebbinghaus(createdAt: Date, importance: number): number {
  const days = (Date.now() - createdAt.getTime()) / (1000 * 60 * 60 * 24);
  const halfLife = Math.max(importance, 0.1) * 10;
  return Math.exp(-days / halfLife);
}

A high-importance memory (importance: 1.0) has a half-life of about 10 days. A low-importance memory (importance: 0.1) fades in roughly a day. This means "the production database uses PostgreSQL 15 on port 5432" stays strong for weeks, while "I renamed a variable in utils.ts" naturally fades away.

The critical mechanism: memories strengthen on access. Every time the AI retrieves a memory, Smara bumps its access_count and resets its last_accessed_at timestamp. Frequently used facts stay fresh. Forgotten facts decay. This is exactly how human memory works — rehearsal strengthens neural pathways.

Search results are ranked by a blend of semantic similarity and temporal decay:

export function blendScore(similarity: number, decayScore: number): number {
  return similarity * 0.7 + decayScore * 0.3;
}

The 70/30 blend means relevance still dominates — you get the fact that best matches your query — but recency and access patterns break ties. A moderately relevant memory that you accessed yesterday beats a slightly more relevant memory you have not touched in a month.

No competitor in this space has Ebbinghaus decay scoring. Mem0, Zep, Supermemory — they all use flat recency or pure vector similarity. Smara's temporal scoring is, as far as I can tell, unique in the developer tools space. It is also what makes the memory feel natural rather than mechanical.

Making It Work Across Teams

Individual memory was the easy part. The hard problem — the one I underestimated by weeks — was shared team memory.

The challenge is not technical complexity. It is semantic complexity. When developer A stores "the auth service uses JWT with RS256," should developer B's AI session see that? Probably yes. When developer A stores "I prefer tabs over spaces," should developer B's session see that? Absolutely not.

Smara solves this with a namespace and visibility model. Every memory has a namespace (like default, project-api, infra) and a visibility setting (private or team). When a memory is stored with a team_id, it is automatically visible to all team members in that namespace. Private memories stay private.

The search engine performs a UNION query across both pools:

SELECT id, fact, importance, created_at, source, namespace,
       1 - (embedding <=> $1::vector) AS similarity
FROM memories
WHERE tenant_id = $2
  AND namespace = $4
  AND valid_until IS NULL
  AND (
    (user_id = $3 AND team_id IS NULL)
    OR (team_id = $5 AND visibility = 'team')
  )

Your private memories and your team's shared memories are searched together, ranked by the same blended score. The AI does not need to know or care about the distinction — it just gets the most relevant facts.

Here is the scenario that makes this powerful: your teammate debugs a gnarly API rate-limiting issue on Monday. Their Claude session stores the root cause and the fix as team-scoped memories. On Tuesday, you hit a related issue. Your Claude session automatically recalls your teammate's findings. No Slack thread to search. No knowledge base article to write. The team's AI tools build collective knowledge passively, as a byproduct of normal work.

Team management is handled through a full REST API — create teams, invite members by email, assign roles (admin, member, read-only), and set per-team memory limits. Deduplication and contradiction detection are scoped per team namespace, so team memories stay clean even as multiple people contribute.

Architecture and Self-Hosting

Smara's architecture has three layers.

The MCP server (@smara/mcp-server on npm) is a lightweight TypeScript process that runs locally alongside your AI tool. It translates MCP tool calls into REST API calls. No state, no database, no dependencies beyond Node.js. Install with npx -y @smara/mcp-server and point it at any Smara-compatible backend.

The hosted API (api.smara.io) is a Fastify application running on Railway. It handles authentication, rate limiting, billing, and the core memory operations. PostgreSQL with pgvector stores the memories and their embeddings. Voyage AI (voyage-3, 1024 dimensions) generates the embeddings for semantic search. The API auto-migrates on startup — create tables, indexes, and extensions automatically.

The storage layer is PostgreSQL with pgvector. Memories are stored with their vector embeddings, importance scores, decay metadata, source tags, namespace labels, team associations, and soft-delete timestamps. Deduplication uses cosine similarity bands: >= 0.985 is a true duplicate (skip), 0.94-0.985 is a contradiction (soft-delete old, store new), below 0.94 is a genuinely new fact.

Self-hosting takes five minutes:

git clone https://github.com/smara-io/api.git
cd api
VOYAGE_API_KEY=your-key docker compose up -d

That gives you the full API on localhost:3011 with a local PostgreSQL instance. Point the MCP server at your self-hosted instance by setting SMARA_API_URL:

{
  "smara": {
    "command": "npx",
    "args": ["-y", "@smara/mcp-server"],
    "env": {
      "SMARA_API_KEY": "smara_your_key_here",
      "SMARA_API_URL": "http://localhost:3011"
    }
  }
}

Everything is MIT-licensed. The hosted service exists for convenience and for teams that do not want to manage infrastructure. The self-hosted path exists for developers who want full control over their data.

Pricing a Developer Tool as a Solo Founder

Pricing a developer tool is an exercise in controlled anxiety. Price too high and nobody tries it. Price too low and you cannot sustain the infrastructure. Price with a free tier that is too generous and you fund everyone's usage out of pocket.

I looked at the competitive landscape. Mem0 charges $249/month for their Pro tier. Zep starts at $25 and climbs to $475. Supermemory charges $399 for teams. These prices make sense for enterprise buyers with procurement budgets. They are prohibitive for indie developers and small teams — exactly the people who need memory the most, because they cannot afford to waste time on context re-establishment.

Smara's tiers:

Plan	Memories	Price
Free	10,000	$0/month
Developer	200,000	$19/month
Pro	2,000,000	$99/month

The free tier is genuinely usable. 10,000 memories covers months of individual use. The goal is not to frustrate free users into upgrading — it is to let them experience the product fully and upgrade when they need team features or higher volume.

The $19 Developer tier hits the "expense it without approval" threshold at most companies. The $99 Pro tier is literally half of Mem0's comparable plan, with more memories included.

The mental model behind this pricing: memory is infrastructure, not a luxury. It should cost about the same as a database or a monitoring tool — present in the budget, never the biggest line item. If a developer saves twenty minutes per session and runs four sessions per day, that is over thirteen hours saved per month. At any reasonable hourly rate, $19 pays for itself before lunch on day one.

What's Next

Smara v1.2 is in active development with three focus areas.

Teams is shipping first. The API already supports team creation, invitation flows, role-based access, and shared memory search (as shown in the code above). The MCP server update will add team-aware tools so that AI sessions can store and retrieve team knowledge seamlessly.

AI Agents comes next. Agents are identities with their own memory namespace, system prompt, and composable skills. Think of a code-review agent that remembers your team's style guide, or an onboarding agent that accumulates institutional knowledge from every new hire's questions. Eight built-in skills are planned: code review, PR review, doc writing, test generation, deploy checklist, onboarding, architecture advising, and security audit.

IDE-native memory panels will give you visibility into what your AI tools remember — browse, search, edit, and delete memories from a visual interface instead of relying on the AI to manage everything silently.

The longer-term vision is straightforward: every AI tool you use should share one brain. Not just coding assistants — any AI agent, any workflow, any platform. Smara is the memory layer that makes that possible.

If you want to try it, the fastest path is three commands:

Get a free API key at smara.io
Add the MCP config to your Claude Code or Cursor setup
Start working — memory happens automatically

Links:

Website: smara.io
GitHub: github.com/smara-io/smara
npm: @smara/mcp-server
API Docs: api.smara.io/docs

I am building Smara as a solo founder. If you have questions about the architecture, the MCP integration, or the Ebbinghaus scoring model, I am @parallelromb — happy to talk.

I Built a Cross-Platform Memory Layer for AI Agents Using Ebbinghaus Forgetting Curves

Sri — Wed, 01 Apr 2026 04:04:05 +0000

I live with Claude Code. It's where I build everything — my API, my infrastructure, my marketing copy. But every new session starts the same way: Claude has no idea who I am.

I'd tell it I prefer Python for backend work. Three sessions later, it suggests TypeScript. I'd explain my project architecture on Monday. By Wednesday, gone. I was re-explaining the same context every single day.

And if you're using Cursor, Codex, or Windsurf, you have this problem too — except worse. Because even if one tool starts remembering, the moment you switch to another, you're back to zero. Each tool is an island.

I tried the usual fixes. Dumped context into a vector store. Built a RAG pipeline. It worked — until the store had hundreds of entries and a two-month-old preference outranked something I said yesterday, just because the phrasing matched better. The retrieval had no sense of time.

That's when I started reading about Hermann Ebbinghaus.

A 140-year-old experiment that changes everything

In 1885, a German psychologist named Hermann Ebbinghaus spent years memorizing nonsense syllables — things like "DAX," "BUP," "ZOL" — and testing how quickly he forgot them. His results produced one of the most replicated findings in all of psychology: the forgetting curve.

The core insight: memory retention decays exponentially. You don't gradually forget things in a linear way — you lose most of the information quickly, then the remainder fades slowly. But here's the part that got me: every time you recall something, the decay rate slows down. Memories you access frequently become durable. Memories you never revisit fade to nothing.

This mapped perfectly to what I needed. A preference mentioned once three months ago should carry less weight than something reinforced yesterday. Frequently accessed context should be strong. Old, unreinforced trivia should quietly disappear.

The math behind it

Ebbinghaus's forgetting curve:

R = e^(-t / S)

Where:

R = retention (0 to 1)
t = time elapsed since the memory was formed
S = memory strength (higher = slower decay)

This is the same math behind spaced repetition systems like Anki. I realized I could apply it to AI agent memory.

What I built

I built Smara — a memory API that combines semantic vector search with Ebbinghaus decay scoring. Every stored memory gets an importance score between 0 and 1. At query time, importance scales the memory strength, so high-importance memories decay slowly while trivial ones fade fast.

The retrieval score blends semantic relevance with temporal decay. Semantic search stays dominant — you still get the most relevant memories — but recency breaks ties. A moderately relevant memory from yesterday can outrank a highly relevant one from three months ago.

I also track access patterns. Every time a memory is retrieved, it gets reinforced — frequently accessed memories stay strong. Memories nobody asks about quietly fade. The specific weights took a while to tune, but the principle is simple: relevance × recency × reinforcement.

The entire API is three calls:

Store a memory:

curl -X POST https://api.smara.io/v1/memories \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "user_abc",
    "fact": "Prefers Python over TypeScript for backend work",
    "importance": 0.8
  }'

Search with decay-aware ranking:

curl "https://api.smara.io/v1/memories/search?\
user_id=user_abc&q=what+language+for+backend&limit=5" \
  -H "Authorization: Bearer YOUR_API_KEY"

The response gives you similarity, decay_score, and the blended score — you can see exactly why a memory was ranked where it was.

Get full user context for your LLM prompt:

curl "https://api.smara.io/v1/users/user_abc/context" \
  -H "Authorization: Bearer YOUR_API_KEY"

Drop the context string into your system prompt and your agent knows who it's talking to.

The cross-platform problem nobody's solving

Building the API was the easy part. The real insight came from dogfooding it.

I had Smara wired into Claude Code via MCP. It worked great — my sessions finally had persistent memory. Claude remembered my preferences, my project context, my architecture decisions. It felt like a different tool.

Then I thought: what about developers using Cursor? Or Codex? Or switching between multiple tools throughout the day? Their memory is siloed in each tool, and none of it carries over. Even Claude Code's built-in memory doesn't follow you to Cursor.

So I made Smara platform-agnostic. Every memory is tagged with its source — which tool stored it — but all memories live in one pool:

{
  "fact": "Prefers Python over TypeScript for backend work",
  "source": "claude-code",
  "namespace": "default",
  "decay_score": 0.97
}

A preference stored via Claude Code is instantly available in Cursor, Codex, or anything else connected to the same account.

For MCP-compatible tools (Claude Code, Cursor, Windsurf), I built an MCP server that handles everything automatically. Add this to your MCP config and restart:

{
  "smara": {
    "command": "npx",
    "args": ["-y", "@smara/mcp-server"],
    "env": { "SMARA_API_KEY": "your-key" }
  }
}

That's it. No manual tool calls. The MCP server instructs the LLM to:

At conversation start: Automatically load stored context
During conversation: Silently store new facts as they come up
On explicit request: Handle "remember this" and "forget that"

You don't configure rules or triggers. The LLM decides what's worth remembering. The Ebbinghaus decay does the rest.

For OpenAI-compatible tools (Codex, ChatGPT, custom GPTs), there's a proxy endpoint that accepts OpenAI function calls. Same memory pool, different protocol. So if you're a Cursor user, a Codex user, or you bounce between tools — your context travels with you.

The result: I store my preferences in Claude Code. A Cursor user on the same Smara account sees that context instantly. Switch to Codex — same memories. One pool, every tool.

How this compares to what's out there

RAG / vanilla vector search. This is where most teams start. Embed everything, retrieve by cosine similarity. Works until your store grows and old entries outrank recent ones because the phrasing happened to match better. No sense of time.

Graph memory (Mem0, etc). Knowledge graphs capture entity relationships, which is powerful for certain use cases. But the setup cost is high — entity extraction, relationship mapping, graph traversal. For most agent memory needs (preferences, decisions, project context), it's over-engineered.

Key-value stores (Redis, DynamoDB). Fast and simple, but no semantic search. You can only retrieve by exact key, which means your agent needs to know exactly what it's looking for.

What I built: Semantic search combined with Ebbinghaus decay. Fuzzy matching that respects time, plus automatic contradiction detection — if a preference changes, the old memory is replaced, not stacked. Three REST endpoints, no SDK to learn. Decay runs at query time, no batch jobs.

What I learned

The biggest surprise was how much a simple decay term changes the feel of agent conversations. With flat retrieval, agents feel like they're reading from a database. With decay-aware retrieval, they feel like they actually know you. Recent interactions carry more weight. Repeated topics build stronger memories. Old noise fades naturally.

The second surprise was that the cross-platform piece matters more than the memory science. Developers don't just use one AI tool — they use three or four. The siloed memory problem is what actually hurts day to day.

If you're building agents that talk to users more than once, or you're tired of Cursor, Codex, or Claude Code forgetting everything between sessions — Smara has a free tier (10,000 memories, no credit card). MCP setup takes 30 seconds. REST API works with anything.

I'm building this in public and would love feedback — especially from Cursor and Codex users. I built this for Claude Code, but the cross-platform piece is where it gets interesting. What memory solutions are you using? What's working, what's not?