Nikhil tiwari

Posted on May 10

I Built Persistent Memory for AI Coding Assistants — Here's How It Works

#ai #programming #mcp #productivity

Every time you open a new AI chat, your assistant forgets everything. I fixed that.

The Problem

If you use Cursor, Claude Code, or Amazon Q regularly, you've probably hit this wall:

You explain your project architecture in Monday's chat. On Tuesday, you open a new session and start from scratch. You paste the same context, re-explain the same patterns, and re-describe the same service boundaries — every single time.

This isn't a minor inconvenience.

For large codebases, every AI interaction starts with 10 minutes of context-loading before you can ask anything useful. For teams, every developer builds their own mental model of the codebase in isolation, while the AI assistant knows none of it.

I've been building production systems on Azure for several years — .NET microservices, KEDA autoscaling, Azure Service Bus pipelines. Our codebase has 40+ services, clean architecture patterns, vendor integration handlers, and years of architectural decisions that live entirely in people's heads.

Every time I opened a new AI chat, I was manually transferring that knowledge into a chat window.

So I built Mnemo.

What Mnemo Does

Mnemo is a local MCP (Model Context Protocol) server that gives AI coding assistants persistent, structured knowledge about your codebase.

One command initializes it.

After that, every AI chat automatically knows:

Your project's architecture and patterns
Your API endpoints
Your engineering decisions
Who owns which part of the codebase
Errors you've already debugged and how you fixed them
Incidents, code reviews, and team knowledge

It works with Cursor, Claude Code, Amazon Q, and any MCP-compatible AI client.

The moment a new AI chat session starts, Mnemo automatically loads your project context.

You never paste architecture descriptions again.

How It Works Technically

When you run:

mnemo init

inside your project, several things happen.

1. AST-Based Codebase Parsing

Mnemo parses your codebase using real language parsers — not regex or grep.

C# → Roslyn
Python → ast
TypeScript → TypeScript Compiler API

This allows Mnemo to understand:

Method signatures
Class hierarchies
Interface implementations
Dependency relationships

From this, it builds a compact repo map — a structured representation of your codebase shape.

Not the full source code.

Just the architecture-level understanding required for AI context.

2. Architecture Detection

Mnemo scans the codebase for structural signals:

Common handler inheritance
IRepository<T> patterns
Command/query separation
Event-driven conventions
DI registration styles

Using these signals, it classifies your architecture automatically:

Clean Architecture
CQRS
Event-Driven
Hexagonal
Repository Pattern
Handler Pattern

This becomes part of the persistent project memory.

3. MCP Server Initialization

Mnemo launches a local MCP server process that exposes tools AI assistants can call.

At the start of each AI chat session, the assistant calls:

mnemo_recall

Mnemo then returns a structured context payload containing:

Repo map
Architecture profile
Recent engineering decisions
Error/debug history
Current task context

Everything is stored locally inside:

.mnemo/

Currently:

JSON files store structured memory
A vector store powers semantic search

No source code leaves your machine.

Why MCP Matters

MCP (Model Context Protocol) is an open protocol from Anthropic that standardizes how AI assistants connect to tools and external context.

Think of it like USB for AI tooling.

Instead of every AI platform building proprietary integrations:

Any MCP server can provide tools
Any MCP-compatible AI client can consume them

Mnemo implements MCP, meaning it works across:

Cursor
Claude Code
Amazon Q
Kiro
Other MCP-compatible tools

Mnemo isn't tied to a single AI assistant.

It's an intelligence layer that upgrades all of them.

What the AI Actually Sees

When the assistant calls mnemo_recall, it receives structured project context like this:

## Project Context
Architecture: Clean Architecture + CQRS
Patterns: Repository (9 interfaces), Handler pattern (12 handlers), DI container

## Decisions
- Use handler pattern for vendor-specific logic
- Auth service uses cache-aside with 5min TTL

## Repo Map
PaymentService/Handlers/
  - StripeHandler
  - PayPalHandler
  - SquareHandler

AuthService/Services/
  - TokenService : ITokenService

With this context loaded, the AI understands:

How the codebase is structured
Which patterns are expected
Existing architectural conventions
Historical engineering decisions

So when you ask:

"Add a new payment handler"

The generated implementation:

Inherits from BasePaymentHandler
Follows existing conventions
Registers correctly in DI
Matches existing architecture

Without Mnemo, most assistants generate generic code that doesn't fit the system design at all.

Installation

Option A: VS Code Extension (Easiest)

Install the Mnemo extension from the VS Code Marketplace
Open a project
Click "Initialize Mnemo?"

Done.

The extension automatically:

Downloads the Mnemo binary
Initializes the repository
Configures MCP

No Python required.

Option B: Homebrew (macOS/Linux)

brew tap Mnemo-mcp/tap
brew install mnemo

Then:

cd your-project
mnemo init

Option C: pip (All Platforms)

pip install mnemo

Or from source:

git clone https://github.com/Mnemo-mcp/Mnemo.git
cd Mnemo
pip install -e .

Then:

cd your-project
mnemo init

Final Thoughts

Mnemo started as a solution to a frustrating problem:

AI assistants forget everything between sessions.

For small projects, that's annoying.

For large production systems, it's a major productivity bottleneck.

Mnemo gives AI coding assistants persistent architectural memory, allowing them to operate with real understanding of your codebase instead of stateless guesses.

I'd love feedback — especially from teams managing large, distributed systems.

What project context do you find yourself re-explaining most often?