DEV Community

Cover image for I Built Persistent Memory for AI Coding Assistants — Here's How It Works
Nikhil tiwari
Nikhil tiwari

Posted on

I Built Persistent Memory for AI Coding Assistants — Here's How It Works

Every time you open a new AI chat, your assistant forgets everything. I fixed that.

The Problem

If you use Cursor, Claude Code, or Amazon Q regularly, you've probably hit this wall:

You explain your project architecture in Monday's chat. On Tuesday, you open a new session and start from scratch. You paste the same context, re-explain the same patterns, and re-describe the same service boundaries — every single time.

This isn't a minor inconvenience.

For large codebases, every AI interaction starts with 10 minutes of context-loading before you can ask anything useful. For teams, every developer builds their own mental model of the codebase in isolation, while the AI assistant knows none of it.

I've been building production systems on Azure for several years — .NET microservices, KEDA autoscaling, Azure Service Bus pipelines. Our codebase has 40+ services, clean architecture patterns, vendor integration handlers, and years of architectural decisions that live entirely in people's heads.

Every time I opened a new AI chat, I was manually transferring that knowledge into a chat window.

So I built Mnemo.


What Mnemo Does

Mnemo is a local MCP (Model Context Protocol) server that gives AI coding assistants persistent, structured knowledge about your codebase.

One command initializes it.

After that, every AI chat automatically knows:

  • Your project's architecture and patterns
  • Your API endpoints
  • Your engineering decisions
  • Who owns which part of the codebase
  • Errors you've already debugged and how you fixed them
  • Incidents, code reviews, and team knowledge

It works with Cursor, Claude Code, Amazon Q, and any MCP-compatible AI client.

The moment a new AI chat session starts, Mnemo automatically loads your project context.

You never paste architecture descriptions again.


How It Works Technically

When you run:

mnemo init
Enter fullscreen mode Exit fullscreen mode

inside your project, several things happen.

1. AST-Based Codebase Parsing

Mnemo parses your codebase using real language parsers — not regex or grep.

  • C# → Roslyn
  • Pythonast
  • TypeScript → TypeScript Compiler API

This allows Mnemo to understand:

  • Method signatures
  • Class hierarchies
  • Interface implementations
  • Dependency relationships

From this, it builds a compact repo map — a structured representation of your codebase shape.

Not the full source code.

Just the architecture-level understanding required for AI context.


2. Architecture Detection

Mnemo scans the codebase for structural signals:

  • Common handler inheritance
  • IRepository<T> patterns
  • Command/query separation
  • Event-driven conventions
  • DI registration styles

Using these signals, it classifies your architecture automatically:

  • Clean Architecture
  • CQRS
  • Event-Driven
  • Hexagonal
  • Repository Pattern
  • Handler Pattern

This becomes part of the persistent project memory.


3. MCP Server Initialization

Mnemo launches a local MCP server process that exposes tools AI assistants can call.

At the start of each AI chat session, the assistant calls:

mnemo_recall
Enter fullscreen mode Exit fullscreen mode

Mnemo then returns a structured context payload containing:

  • Repo map
  • Architecture profile
  • Recent engineering decisions
  • Error/debug history
  • Current task context

Everything is stored locally inside:

.mnemo/
Enter fullscreen mode Exit fullscreen mode

Currently:

  • JSON files store structured memory
  • A vector store powers semantic search

No source code leaves your machine.


Why MCP Matters

MCP (Model Context Protocol) is an open protocol from Anthropic that standardizes how AI assistants connect to tools and external context.

Think of it like USB for AI tooling.

Instead of every AI platform building proprietary integrations:

  • Any MCP server can provide tools
  • Any MCP-compatible AI client can consume them

Mnemo implements MCP, meaning it works across:

  • Cursor
  • Claude Code
  • Amazon Q
  • Kiro
  • Other MCP-compatible tools

Mnemo isn't tied to a single AI assistant.

It's an intelligence layer that upgrades all of them.


What the AI Actually Sees

When the assistant calls mnemo_recall, it receives structured project context like this:

## Project Context
Architecture: Clean Architecture + CQRS
Patterns: Repository (9 interfaces), Handler pattern (12 handlers), DI container

## Decisions
- Use handler pattern for vendor-specific logic
- Auth service uses cache-aside with 5min TTL

## Repo Map
PaymentService/Handlers/
  - StripeHandler
  - PayPalHandler
  - SquareHandler

AuthService/Services/
  - TokenService : ITokenService
Enter fullscreen mode Exit fullscreen mode

With this context loaded, the AI understands:

  • How the codebase is structured
  • Which patterns are expected
  • Existing architectural conventions
  • Historical engineering decisions

So when you ask:

"Add a new payment handler"

The generated implementation:

  • Inherits from BasePaymentHandler
  • Follows existing conventions
  • Registers correctly in DI
  • Matches existing architecture

Without Mnemo, most assistants generate generic code that doesn't fit the system design at all.


Installation

Option A: VS Code Extension (Easiest)

  1. Install the Mnemo extension from the VS Code Marketplace
  2. Open a project
  3. Click "Initialize Mnemo?"

Done.

The extension automatically:

  • Downloads the Mnemo binary
  • Initializes the repository
  • Configures MCP

No Python required.


Option B: Homebrew (macOS/Linux)

brew tap Mnemo-mcp/tap
brew install mnemo
Enter fullscreen mode Exit fullscreen mode

Then:

cd your-project
mnemo init
Enter fullscreen mode Exit fullscreen mode

Option C: pip (All Platforms)

pip install mnemo
Enter fullscreen mode Exit fullscreen mode

Or from source:

git clone https://github.com/Mnemo-mcp/Mnemo.git
cd Mnemo
pip install -e .
Enter fullscreen mode Exit fullscreen mode

Then:

cd your-project
mnemo init
Enter fullscreen mode Exit fullscreen mode

Final Thoughts

Mnemo started as a solution to a frustrating problem:

AI assistants forget everything between sessions.

For small projects, that's annoying.

For large production systems, it's a major productivity bottleneck.

Mnemo gives AI coding assistants persistent architectural memory, allowing them to operate with real understanding of your codebase instead of stateless guesses.

I'd love feedback — especially from teams managing large, distributed systems.

What project context do you find yourself re-explaining most often?

Top comments (0)