DEV Community

Ripon C Malo
Ripon C Malo

Posted on

How I built projectmem — an MCP server that gives Claude, Cursor, and Codex persistent memory

Few months back, my AI coding agent confidently suggested this fix:

.header-preview {
    contain: layout;
}
Enter fullscreen mode Exit fullscreen mode

I'd tried that exact thing the previous Friday. It didn't work. The agent had no memory of the failure — different chat, fresh context, same dead-end.

This happens every Monday. Across Claude, Cursor, Codex, Antigravity — agents are stateless between sessions. Each new conversation pays 5,000–20,000 tokens to rebuild context that existed yesterday. The model isn't broken; the architecture is.

So I built projectmem. This post walks through what it actually does — five killer features, a four-view D3 dashboard, the architecture, and the conda/venv hook bug that nearly shipped v0.1.3 broken. It's open source, MIT, runs 100% local, ships as a single pip install.


What projectmem is, in 60 seconds

A small Python package that does three things:

  1. Captures development events — bugs, fix attempts, fixes, decisions, gotchas — into plain-text JSONL inside your repo (.projectmem/events.jsonl). You commit it. You can git diff it.

  2. Exposes 14 MCP tools so any MCP-capable AI client (Claude Desktop, Claude Code, Cursor, Antigravity, Codex) reads and writes that memory directly. One config block per client, works in all of them.

  3. Runs git hooks that warn you at commit time before you repeat a logged failed approach.

pip install projectmem
pjm init
Enter fullscreen mode Exit fullscreen mode

That's the install. pjm init writes the memory directory, drops a CLAUDE.md bridge file, installs the git hooks, pre-populates PROJECT_MAP.md from your stack manifests, and prints a ready-to-paste MCP client config block. No cloud, no daemon, no telemetry.


The five killer features

1. Pre-commit warnings (the differentiator)

The git pre-commit hook checks your staged file against memory. If there's a logged failed approach on that file, it warns you before the commit lands.

projectmem: Pre-Commit Check
────────────────────────────────────────────────────────────

  styles.css
    WARN  1 failed attempt on this file
           Last failure: tried contain: layout — preview still jumps
             (3 days ago)

────────────────────────────────────────────────────────────
1 warning(s). Review before committing.
Enter fullscreen mode Exit fullscreen mode

Most "AI memory" tools are retrieval engines — they store conversations and surface them when asked. projectmem is a judgment layer — it captures events with explicit outcomes (worked / failed / partial) and uses git context to interrupt you before you waste another afternoon. The pre-commit hook is the unlock.

2. Cross-project memory

Lessons in one repo automatically surface in others on the same stack. Library gotchas, framework decisions, patterns you only had to learn once.

~/.projectmem/global/
├── library_gotchas.jsonl
├── patterns.jsonl
└── .promotable.json    ← self-curating cache
Enter fullscreen mode Exit fullscreen mode

When you pjm init a new project, projectmem detects your stack from pyproject.toml / package.json / Cargo.toml / go.mod and injects relevant cross-project gotchas into AI_INSTRUCTIONS.md. Stack-aware filtering, so a vite project's mention of "next" doesn't pollute Next.js gotchas in your actual Next.js repos.

100% local — ~/.projectmem/global/ stays on your machine. No cloud sync, no account, no telemetry. A gin gotcha you log in proj-go shows up in your next Go repo. A vite gotcha in proj-react shows up in your next React repo.

3. Provable ROI (pjm score)

An A+ → F letter grade backed by concrete numbers. The first AI memory tool with metrics a CTO can verify.

$ pjm score

projectmem Prevention Score: A- (87/100)
  Failed approaches on record: 8
  Decisions documented: 14
  Fixes with context: 12
  Debugging hours saved: ~12h
  Tokens saved: 47,500
  Estimated USD saved: $4.75
Enter fullscreen mode Exit fullscreen mode

Output as terminal, JSON for CI (pjm score --format json), or a shields.io badge for your README. Lets you put a number on the value the memory layer is actually producing — instead of a marketing claim.

4. Smart context injection (pjm wrap)

Launches your AI agent with a token-budgeted context block already loaded — so the agent starts with your project memory inherited, instead of blank.

$ pjm wrap claude --tokens 2000
# launches Claude with a 2000-token context block of:
# - your project summary
# - recent decisions
# - relevant cross-project gotchas
# - any failed approaches on files you'll likely touch
Enter fullscreen mode Exit fullscreen mode

Works with Claude Code, Cursor (writes to .cursorrules), Aider, and clipboard-paste for everything else. The AI session begins experienced, not from zero.

5. Real-time file watcher (pjm watch)

Auto-starts on pjm init in interactive terminals. Detects rapid edits to the same file (debugging sessions) and logs churn events automatically. Battery-aware, gitignore-aware. Catches what AI silently misses — the between-commits iteration where most actual debugging happens.

$ pjm watch --status
projectmem watcher: active (PID 47891)
  Watching: /Users/me/repos/your-project
  Events captured today: 12 churn, 3 commit
  Battery: AC power (full speed)
Enter fullscreen mode Exit fullscreen mode

The visualization: four D3 dashboards from one command

This is the part I think most projects skip and shouldn't.

$ pjm visualize
# opens localhost:8765 in your browser
Enter fullscreen mode Exit fullscreen mode

You get an interactive D3 dashboard with four views, all auto-generated from your memory — zero extra AI tokens.

Story Map

The complete narrative of your project — every decision, milestone, and failure visualized as an interactive force-directed graph. Failed files glow red in the heatmap. Drag nodes around. Zoom into problem areas. Your AI reads this to understand not just what your project is, but how it got here.

ROI Dashboard

Live visualization of how much projectmem actually saves you. Animated counters for tokens prevented and USD protected. Capture-source donut showing manual vs auto-captured events. File churn heatmap surfacing your most-debugged files. Cumulative savings area chart over time.

Architecture Map

Toggle between a horizontal dendrogram (clean, hierarchical) and a force-directed graph (relationships, churn). Both auto-generated from PROJECT_MAP.md. Zoom, pan, color-coded by folder.

Event Timeline

Every event in your memory rendered chronologically — with AUTO badges distinguishing auto-captured events from manual ones. Filter by Manual / Auto-captured / event type. Activity bar chart shows the rhythm of your project.

The whole thing is one HTTP server, vanilla D3.js, no React, no framework. Loads instantly. The view your AI agent reads in summary.md is the same view you can scrub through interactively when you open the dashboard.


The architecture (for the engineers in the room)

Stack:

  • Python 3.10+, ~600 LOC core
  • Dependencies: mcp, typer, watchdog. That's it. No frameworks, no daemon, no port.
  • Storage: append-only JSONL inside the repo, distilled into Markdown.
  • Transport: stdio MCP. The AI client spawns the server as a subprocess; no network, no localhost binding, no process to babysit.

The 14 tools:

get_instructions     get_summary        get_project_map
get_context          get_score          get_global_gotchas
get_issue            search_events      precheck_file
log_issue            record_attempt     record_fix
add_decision         add_note
Enter fullscreen mode Exit fullscreen mode

Every parameter has a Pydantic Field(description=…) annotation, and where it matters, a schema-level constraint:

def record_attempt(
    summary: Annotated[str, Field(description="One-line description of what you tried.")],
    outcome: Annotated[str, Field(
        description="Result of the attempt.",
        pattern="^(worked|failed|partial)$",
    )] = "failed",
    ...
)
Enter fullscreen mode Exit fullscreen mode

The schema literally rejects outcome="maybe" before the tool body runs. Cleanest way I've found to add real guardrails to LLM-generated tool calls.

Why stdio, not HTTP

agentmemory (the most-starred competitor) runs a Node daemon on three ports (3111 / 3112 / 3113) with a separate iii-engine binary and ~21,800 LOC. Excellent retrieval. Heavy install.

projectmem stays stdio-only on purpose:

  • No port to babysit
  • No daemon to remember to start
  • No conflict with existing infra
  • The AI client manages the subprocess lifecycle for free

Tradeoff: no cross-process state. For single-developer-per-project workflows, that's not a real tradeoff — every benefit goes to "one less thing to debug."


The privacy guardrail

projectmem stores event text verbatim in your repo. That's the whole point — the memory is plain text you can git diff. The downside: a careless paste in an AI chat — "the bug repros when I set OPENAI_API_KEY=sk-..." — would otherwise land that key on disk.

v0.1.3 closes that hole. Before any event hits disk, storage.append_event runs a conservative pattern scrubber across the user-supplied text fields. Matches against high-confidence patterns get replaced with [REDACTED:<kind>]:

Patterns: OpenAI sk-, GitHub PAT, AWS AKIA, Google AIza,
          Slack tokens, Stripe live/test keys, JWTs (eyJ...),
          Bearer tokens, PEM private-key blocks.
Enter fullscreen mode Exit fullscreen mode

Patterns are intentionally narrow. Ordinary debugging prose — "tried contain: layout", "forgot password reset flow" — is never touched. 29 unit tests pin both true-positive and false-positive behavior.

Default-on. PROJECTMEM_NO_REDACT=1 opts out for the rare contexts where you genuinely want the raw text.

Either it's there from day one or users learn not to trust the tool. v0.1.3 makes it day one.


How to try it

pip install projectmem
cd your-project
pjm init
Enter fullscreen mode Exit fullscreen mode

pjm init ends by printing a ready-to-paste MCP config block (absolute sys.executable baked in to dodge the Claude-Desktop / Cursor PATH-inheritance gotcha). Paste it into your client's config file:

  • Claude Desktop: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Cursor: ~/.cursor/mcp.json
  • Antigravity legacy IDE: ~/.gemini/antigravity/mcp_config.json
  • Codex (TOML, not JSON): ~/.codex/config.toml

Cold-start your client and the 14 projectmem tools appear in the MCP panel.

Links:


What's next

The v0.2 roadmap:

  • Stale-memory detection — flag, never delete. Cross-reference each decision's referenced file against recent git activity; surface decisions that predate heavy churn as "possibly stale, confirm or supersede."
  • Explicit --supersedes on add_decision — the honest version of memory decay. Old decision retired with a back-reference; nothing destroyed; full audit trail.
  • Semantic search as an opt-in extra (sentence-transformers + sqlite-vec). Default install stays dependency-light.

Honest about the rough edges

  • Search is exact substring today, not semantic
  • API may still shift before 1.0
  • The precheck heuristics are simple right now (file-name match → surface failed attempts)
  • Recall benchmarks are unpublished (the equivalent for "judgment accuracy" hasn't been defined yet)

But: 58 unit tests, end-to-end verified across Claude Desktop / Claude Code / Cursor / Antigravity / Codex, the conda gotcha properly fixed, the privacy guardrail real and tested, the visualization dashboard genuinely useful.

If you build with AI coding agents every day, try it once on a real project. The pre-commit hook usually catches its first real failure within the first week.

What's the worst "I already told you this last week" moment you've had with your AI agent? Reply below — that's exactly the pattern the precheck heuristics need to learn from.

Thanks for reading.

— Ripon

Top comments (0)