<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: patsa2561-art</title>
    <description>The latest articles on DEV Community by patsa2561-art (@patsa2561art).</description>
    <link>https://dev.to/patsa2561art</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3913847%2Ff4a673de-1527-4dc6-8f83-ea25d3c7a5c0.png</url>
      <title>DEV Community: patsa2561-art</title>
      <link>https://dev.to/patsa2561art</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/patsa2561art"/>
    <language>en</language>
    <item>
      <title>I got tired of explaining my codebase to AI every conversation. So I gave it a memory.</title>
      <dc:creator>patsa2561-art</dc:creator>
      <pubDate>Tue, 05 May 2026 11:36:48 +0000</pubDate>
      <link>https://dev.to/patsa2561art/i-got-tired-of-explaining-my-codebase-to-ai-every-conversation-so-i-gave-it-a-memory-4h2c</link>
      <guid>https://dev.to/patsa2561art/i-got-tired-of-explaining-my-codebase-to-ai-every-conversation-so-i-gave-it-a-memory-4h2c</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Why does this auth flow use JWT instead of sessions?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;My AI coding assistant gave a confident, well-formatted, completely generic answer. The actual reason was buried in a 2024-08 commit referencing an incident in our pager. The AI never saw it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Every conversation I had with my AI assistant started from zero. The codebase had hundreds of commits, dozens of architectural decisions, a graveyard of &lt;em&gt;"we tried X, it broke prod, we switched to Y"&lt;/em&gt; — and none of that context was reachable from inside the IDE.&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;Mneme&lt;/strong&gt; — an open-source memory layer that gives AI coding assistants persistent, queryable access to a codebase's history.&lt;/p&gt;

&lt;p&gt;This post is about three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;What hybrid retrieval actually means&lt;/strong&gt; when you build it for code (not for documents)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why confidence scoring matters more than answer quality&lt;/strong&gt; — and how I made the AI shut up when it didn't know&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What MCP unlocks&lt;/strong&gt; when you stop treating AI as a chat interface and start treating it as a tool user&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you're building anything similar, I hope the war stories help.&lt;/p&gt;




&lt;h2&gt;
  
  
  The architecture in one paragraph
&lt;/h2&gt;

&lt;p&gt;Mneme indexes your git history + code structure into a local SQLite database with FTS5 + a vector column. When the AI asks a question, two retrievers run in parallel: &lt;strong&gt;BM25&lt;/strong&gt; over commit messages / PR text / code, and &lt;strong&gt;cosine similarity&lt;/strong&gt; over embedding vectors of the same. Their results get fused via &lt;strong&gt;Reciprocal Rank Fusion&lt;/strong&gt; (k=60) into a single ranking. A confidence classifier looks at the top-1 score &lt;em&gt;and the gap to top-2/3&lt;/em&gt; to decide whether to answer or refuse. Only then does an LLM see the top-K hits, with explicit citations, to produce the final answer.&lt;/p&gt;

&lt;p&gt;The whole thing runs locally by default. Embeddings via Ollama (offline) or OpenAI (your key). MIT licensed. Exposed to AI clients via the &lt;strong&gt;Model Context Protocol&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lesson 1: BM25 alone misses the point. Cosine alone misses the words.
&lt;/h2&gt;

&lt;p&gt;The instinct when building "AI search over a repo" is to reach straight for embeddings. Just chunk everything, embed it, and use cosine similarity. Done.&lt;/p&gt;

&lt;p&gt;In practice, that fails on the queries developers actually ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;"Why does the webhook handler retry 3 times?"&lt;/em&gt; → you want a commit that &lt;strong&gt;mentions "retry"&lt;/strong&gt; verbatim, not a semantically similar but unrelated paragraph.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"How does our auth work?"&lt;/em&gt; → you want a structural understanding, not just keyword matches.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The BM25 + cosine fusion, ranked through RRF, gets this right because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;BM25&lt;/strong&gt; wins when queries contain rare keywords (variable names, error codes, commit hashes).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cosine&lt;/strong&gt; wins when queries are conceptual ("how does X work").&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RRF&lt;/strong&gt; combines the two without needing to calibrate scores between fundamentally different scales.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The fusion code is ~30 lines. Most of the work is choosing the right things to embed (commit subjects, PR titles, code identifiers — &lt;em&gt;not&lt;/em&gt; full diffs).&lt;/p&gt;




&lt;h2&gt;
  
  
  Lesson 2: Confidence &amp;gt; correctness
&lt;/h2&gt;

&lt;p&gt;Most retrieval systems will &lt;em&gt;always&lt;/em&gt; return something — even when "something" is noise. That is the worst possible failure mode. The user trusts the answer because the system gave one.&lt;/p&gt;

&lt;p&gt;I added a confidence classifier with two signals:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Static floor&lt;/strong&gt; — top-1 score must clear a configurable threshold (FTS hits + cosine).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adaptive gap&lt;/strong&gt; — top-1 must be meaningfully better than top-2 and top-3.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If both fail, Mneme refuses to answer:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"I don't have strong context for this. The closest matches were [X, Y, Z] — those don't look directly relevant. You may want to ask differently or check if the relevant history exists in this repo."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This refusal is the single most valuable feature in the whole tool. An AI assistant that &lt;em&gt;knows when it doesn't know&lt;/em&gt; is useful. One that confidently fabricates is dangerous.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lesson 3: MCP changes the whole interaction model
&lt;/h2&gt;

&lt;p&gt;I started with a CLI: &lt;code&gt;mneme ask "..."&lt;/code&gt; returns an answer. Useful, but you have to leave the AI conversation to use it.&lt;/p&gt;

&lt;p&gt;Then I exposed the same tools through an &lt;strong&gt;MCP server&lt;/strong&gt;. Now Claude Code, Cursor, Continue, and Codex CLI can call &lt;code&gt;mneme_ask&lt;/code&gt;, &lt;code&gt;mneme_why&lt;/code&gt;, &lt;code&gt;mneme_search_commits&lt;/code&gt; directly during their reasoning loop.&lt;/p&gt;

&lt;p&gt;The user-experience difference is enormous. Before:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Me:&lt;/strong&gt; Why does this auth code use JWT?&lt;br&gt;
&lt;strong&gt;AI:&lt;/strong&gt; &lt;em&gt;"Probably because JWT is stateless and scalable in distributed systems..."&lt;/em&gt; &lt;em&gt;(generic guess)&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;After (with Mneme as MCP):&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Me:&lt;/strong&gt; Why does this auth code use JWT?&lt;br&gt;
&lt;strong&gt;AI:&lt;/strong&gt; &lt;em&gt;(calls &lt;code&gt;mneme_ask "auth JWT"&lt;/code&gt;)&lt;/em&gt;&lt;br&gt;
&lt;strong&gt;AI:&lt;/strong&gt; &lt;em&gt;"Per commit a3f9b21 from 2024-08, you switched from sessions to JWT after the rate-limit incident referenced in #482. The retry logic at line 47 was added in the hotfix that followed."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Same model. Same prompt. Different reasoning, because the AI now has memory.&lt;/p&gt;




&lt;h2&gt;
  
  
  Three commands that surprised me
&lt;/h2&gt;

&lt;p&gt;I built fifteen "killer" commands, but three turned out to be more useful than I expected:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;mneme premortem "&amp;lt;intent&amp;gt;"&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Before you write any code, ask: &lt;em&gt;"how often has this kind of change been regretted in this repo?"&lt;/em&gt;. Mneme finds similar past attempts via token-overlap similarity, walks forward in time looking for revert / hotfix / incident signals, and returns a regret probability.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"7 of 9 similar past attempts ended badly (78%). Top risk: cache invalidation regression — happened 3× before."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It cites the actual commits. This is &lt;em&gt;not&lt;/em&gt; generic AI advice. It's grounded prediction from your own failure history.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;mneme time-machine &amp;lt;file&amp;gt;&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Groups a file's commits into &lt;strong&gt;eras&lt;/strong&gt; — birth, rewrite, evolution, firefight, polish, plateau, twilight — instead of dumping a flat log. Each era has a label pulled from the most informative commit message.&lt;/p&gt;

&lt;p&gt;You read 8 epochs and you understand the file's life. Reading 200 commits would have given you nothing.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;mneme ghost&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Surfaces &lt;em&gt;ghost code&lt;/em&gt; — files that haunt the repo without doing anything. Combines staleness, low-touch ratio, and TODO density into a single ghostliness score. Catches half-finished features and stale TODOs that survived through every later edit.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I learned about building dev tools
&lt;/h2&gt;

&lt;p&gt;A few things that would have saved me time:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tokenizer choice matters more than embedding choice.&lt;/strong&gt; I lost a week to FTS5's default &lt;code&gt;porter unicode61&lt;/code&gt; tokenizer not handling CJK / Thai / Arabic. Migrating the index to &lt;code&gt;trigram&lt;/code&gt; was painful but unavoidable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Schema-versioned migrations save your life.&lt;/strong&gt; Every time the schema changed, the upgrade path could have nuked users' indexes. Versioned migrations + idempotent backfills meant zero data loss across 13 releases.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Property-based testing &amp;gt; unit tests for retrieval.&lt;/strong&gt; I run 16 properties × 10,000 cases each (160k generated cases per CI run) via &lt;code&gt;fast-check&lt;/code&gt;. Caught three edge cases that hand-written tests missed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Confidence scoring is the killer feature, not retrieval quality.&lt;/strong&gt; The first version had 95th-percentile retrieval and 0% trust. The next version had 90th-percentile retrieval and 100% trust because it refused when uncertain. The drop in raw quality was a win.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/patsa2561-art/mneme-ai" rel="noopener noreferrer"&gt;https://github.com/patsa2561-art/mneme-ai&lt;/a&gt;&lt;br&gt;
npm: &lt;a href="https://www.npmjs.com/package/mneme-ai" rel="noopener noreferrer"&gt;https://www.npmjs.com/package/mneme-ai&lt;/a&gt;&lt;br&gt;
MCP Registry: &lt;a href="https://registry.modelcontextprotocol.io/" rel="noopener noreferrer"&gt;https://registry.modelcontextprotocol.io/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Wiki: &lt;a href="https://github.com/patsa2561-art/mneme-ai/wiki" rel="noopener noreferrer"&gt;https://github.com/patsa2561-art/mneme-ai/wiki&lt;/a&gt;&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
bash
npx mneme-ai init
npx mneme-ai ask "why does X exist?"

Thanks for reading.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>typescript</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
