<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Soleiman Mansouri</title>
    <description>The latest articles on DEV Community by Soleiman Mansouri (@soleiman_mansouri_a4aa763).</description>
    <link>https://dev.to/soleiman_mansouri_a4aa763</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3911954%2F7b568a75-f898-42a2-9279-3194c70f5f0f.png</url>
      <title>DEV Community: Soleiman Mansouri</title>
      <link>https://dev.to/soleiman_mansouri_a4aa763</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/soleiman_mansouri_a4aa763"/>
    <language>en</language>
    <item>
      <title>I Built a Debugging Memory for AI Coding Agents — Here's the System Behind It</title>
      <dc:creator>Soleiman Mansouri</dc:creator>
      <pubDate>Mon, 04 May 2026 11:42:53 +0000</pubDate>
      <link>https://dev.to/soleiman_mansouri_a4aa763/i-built-a-debugging-memory-for-ai-coding-agents-heres-the-system-behind-it-1i96</link>
      <guid>https://dev.to/soleiman_mansouri_a4aa763/i-built-a-debugging-memory-for-ai-coding-agents-heres-the-system-behind-it-1i96</guid>
      <description>&lt;p&gt;Here's a question that changed how I debug with AI agents:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What if the agent checked "have I seen this before?" before investigating every bug?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I started logging my debugging sessions — every root cause, every false lead, every fix. After 100+ production bugs across voice pipelines, API integrations, and distributed systems, a clear pattern emerged: the same ~22 root causes explain nearly everything. Config chain gaps. Stale caches. Silent fallbacks. Observer multipliers. Retry/timeout mismatches.&lt;/p&gt;

&lt;p&gt;The bugs repeat. The agents don't remember. That's the gap.&lt;/p&gt;

&lt;p&gt;So I built Debug Bank — a pattern-first debugging memory system that teaches AI agents to remember.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: AI Agents Learn Nothing
&lt;/h2&gt;

&lt;p&gt;Here's what happens today:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Bug appears&lt;/li&gt;
&lt;li&gt;Agent investigates from scratch&lt;/li&gt;
&lt;li&gt;Agent finds root cause, fixes it&lt;/li&gt;
&lt;li&gt;Session ends&lt;/li&gt;
&lt;li&gt;Same bug appears in a different file&lt;/li&gt;
&lt;li&gt;Agent investigates from scratch again&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Stack Overflow data shows AI-generated code has &lt;strong&gt;2.66x more formatting problems&lt;/strong&gt; and &lt;strong&gt;1.5-2x more security bugs&lt;/strong&gt; than human code. Much of that comes from agents never learning from past mistakes.&lt;/p&gt;

&lt;p&gt;Google's ReasoningBank research (arxiv.org/abs/2504.09762) proved that distilling failures into reusable patterns yields &lt;strong&gt;+8.3% on WebArena&lt;/strong&gt; and &lt;strong&gt;+4.6% on SWE-Bench&lt;/strong&gt;. But that research was theoretical. I needed a production system.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Insight: Patterns Repeat
&lt;/h2&gt;

&lt;p&gt;After documenting my own debugging trajectories, I extracted 22 root cause patterns that account for ~95% of the bugs I encounter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;P03 (Observer/Hook Multiplier)&lt;/strong&gt; — Event listeners registered multiple times, causing duplicate processing per trigger&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;P07 (Partial Rollback)&lt;/strong&gt; — A multi-step deploy succeeds partway, leaving the system in an inconsistent state&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;P11 (Retry/Timeout Mismatch)&lt;/strong&gt; — Retry interval exceeds the timeout window, so retries never actually execute&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;P15 (Async Fire-and-Forget)&lt;/strong&gt; — A background task fails silently because no one awaits its result or checks its status&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't theoretical. Each has 2-3 real-world examples from production, a 30-second checklist, and a fix strategy.&lt;/p&gt;

&lt;p&gt;The key insight: &lt;strong&gt;If an agent checks "have I seen this before?" at the start of every debugging session, 70% of investigations end at step 1.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How Debug Bank Works
&lt;/h2&gt;

&lt;p&gt;It's three layers stacked on top of each other:&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: Pattern Bank (P01-P22)
&lt;/h3&gt;

&lt;p&gt;When a bug is reported, the agent reads the pattern bank and asks: "Is this a known pattern?"&lt;/p&gt;

&lt;p&gt;If yes, it pulls the checklist and verification strategy. If the known fix applies, we're done in 30 seconds. If not, we know why it doesn't match and adjust.&lt;/p&gt;

&lt;p&gt;If no pattern matches, we proceed to the 7-step protocol.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2: The 7-Step Protocol
&lt;/h3&gt;

&lt;p&gt;I call it the "Debug Trajectory" — it's the exact sequence I follow on every production incident:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Reproduce&lt;/strong&gt; — Get the exact error with full output (logs, stack trace, HTTP status)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hypothesize&lt;/strong&gt; — State 2-3 ranked, falsifiable root causes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Isolate&lt;/strong&gt; — Test hypotheses one at a time using binary search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Diagnose&lt;/strong&gt; — Identify the single root cause by tracing the full call chain&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fix&lt;/strong&gt; — Make a minimal change addressing the root cause, not a symptom&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Record&lt;/strong&gt; — Document the trajectory in a domain catalog&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Capture feedback&lt;/strong&gt; — When corrected, turn the correction into a persistent rule&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Every step produces evidence for the next. You never skip.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: The 3-Exchange Stop Rule
&lt;/h3&gt;

&lt;p&gt;This is the single most impactful rule in the system.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;If 3 rounds of iterative fixing show no progress, STOP.&lt;/strong&gt; Don't continue the same approach. Re-plan from scratch, add logging, or switch strategy entirely.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's it. That rule alone prevents the #1 failure mode of AI agents — circular debugging that wastes tokens and produces nothing. Most agents loop 5-10 times on hard problems. This forces a strategy pivot.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pre-Deploy: Catching Bugs Before They Ship
&lt;/h2&gt;

&lt;p&gt;The pattern bank is reactive — it kicks in after a bug appears. But the cheapest time to catch a bug is before it ships.&lt;/p&gt;

&lt;p&gt;So I built a pre-deploy scanner. Here's how it works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bash integrations/pre-deploy-check.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The scanner:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reads your &lt;code&gt;git diff&lt;/code&gt; (staged changes)&lt;/li&gt;
&lt;li&gt;Greps for keywords linked to each pattern (e.g., "observer" for P03, "fallback" for P08)&lt;/li&gt;
&lt;li&gt;Prints a ranked list of flagged patterns with their quick-check&lt;/li&gt;
&lt;li&gt;Exits non-zero if matches are found (so it blocks your deploy pipeline)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Example output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[debug-bank] Pre-Deploy Pattern Scan
Scanning git diff for known failure patterns...

  FLAGGED  P03 Observer/Hook Multiplier
           keyword: subscribe
           Check: Deduplicate by event/frame ID

  FLAGGED  P08 Config Resolution Chain Gap
           keyword: fallback
           Check: Trace the full fallback chain

2 pattern(s) flagged. Review before deploying.
Exit code: 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You fix the flagged issues, run the scanner again, and when there are no matches, the deploy proceeds.&lt;/p&gt;

&lt;p&gt;This runs before human review — a catch-all safety net that uses the same pattern bank for prevention, not just diagnosis.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Feedback Becomes Rules
&lt;/h2&gt;

&lt;p&gt;When you tell Claude "don't do that" or correct its approach, that correction becomes a permanent rule:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;no-mocking-database&lt;/span&gt;
&lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;feedback&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
Integration tests must hit a real database, not mocks.

&lt;span class="gs"&gt;**Why:**&lt;/span&gt; Prior incident where mock/prod divergence masked a broken migration.
&lt;span class="gs"&gt;**How to apply:**&lt;/span&gt; Any test file touching database operations.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;Why&lt;/code&gt; field lets the agent judge edge cases instead of blindly following rules. After 30+ feedback rules, the agent rarely needs the same correction twice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Zero Dependencies, Just Markdown
&lt;/h2&gt;

&lt;p&gt;Here's the radical part: &lt;strong&gt;It's all markdown files.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No database. No API. No external dependencies. No installation.&lt;/p&gt;

&lt;p&gt;Setup takes 30 seconds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-O&lt;/span&gt; https://raw.githubusercontent.com/soleimanmansouri/debug-bank/main/CLAUDE.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Drop it in your project. Your agent reads it. That's it.&lt;/p&gt;

&lt;p&gt;Works with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code&lt;/strong&gt; (drop-in via CLAUDE.md)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Codex CLI, Gemini CLI, Cursor&lt;/strong&gt; (via AGENTS.md)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom agents&lt;/strong&gt; (copy patterns, follow the protocol)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pre-deploy scanner is a bash script — no external dependencies beyond &lt;code&gt;git&lt;/code&gt; and &lt;code&gt;grep&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Example: P08 in Production
&lt;/h2&gt;

&lt;p&gt;Let me show you how this works with a real pattern.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The bug:&lt;/strong&gt; Transfer requests were routing to wrong department numbers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Pattern Check&lt;/strong&gt; — I read P08 (Config Resolution Chain Gap) and recognized the exact symptom: "System falls through to stale data when a link in the fallback chain is missing."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Verify the fix applies&lt;/strong&gt; — Config was resolved via: API response → database table → YAML → hardcoded fallback. The department table was empty. System fell through to YAML with outdated numbers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Fix&lt;/strong&gt; — Populate the database table. Add monitoring for empty entries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; 30 seconds. No investigation needed.&lt;/p&gt;

&lt;p&gt;Without the pattern bank, I would have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Checked the API response (working)&lt;/li&gt;
&lt;li&gt;Checked the database query (returns no results, but why?)&lt;/li&gt;
&lt;li&gt;Checked the YAML file (contains old numbers, but why is it using those?)&lt;/li&gt;
&lt;li&gt;Spent an hour on a config chain I didn't fully understand&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pattern bank shortened it to 30 seconds. Pattern bank applies to every subsequent bug of that type — in this project and in any future project.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building the Pattern Bank
&lt;/h2&gt;

&lt;p&gt;The 22 patterns came from documenting my own debugging work across diverse systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Voice pipelines&lt;/strong&gt; (Pipecat, ElevenLabs)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API integrations&lt;/strong&gt; (Twilio, Odoo, Supabase)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configuration management&lt;/strong&gt; (database → YAML fallback chains)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Distributed systems&lt;/strong&gt; (cache invalidation, retry storms)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each pattern includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A clear description of the failure mode&lt;/li&gt;
&lt;li&gt;A 30-second checklist&lt;/li&gt;
&lt;li&gt;2-3 real-world examples&lt;/li&gt;
&lt;li&gt;A fix strategy&lt;/li&gt;
&lt;li&gt;Prevention guidance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The patterns transfer across projects. P02 (Multiple Writers) appears in voice pipelines, web APIs, databases, and infrastructure. Once learned, it's learned forever.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scenarios and Postmortems
&lt;/h2&gt;

&lt;p&gt;The repo includes higher-tier debugging challenges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scenarios&lt;/strong&gt; (S01-S03) — Multi-service L3-L4 problems where the symptom is in one place and the root cause is somewhere else entirely&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Postmortems&lt;/strong&gt; (PM01-PM03) — Anonymized production incidents with full timelines, blast radius analysis, and systemic mitigation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are for learning. They teach you to think like a senior engineer tracking distributed failures.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Works
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Compound learning&lt;/strong&gt; — Every bug fix teaches the system. After 50 bugs, most issues resolve at step 1 (pattern match).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Transfers across projects&lt;/strong&gt; — You build the pattern bank once. It moves with you. P02 (Multiple Writers) is P02 everywhere.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;User-driven self-improvement&lt;/strong&gt; — Feedback rules capture corrections with context. The agent gets better at matching your expectations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Evidence-based&lt;/strong&gt; — Every pattern has a checklist. Every catalog entry links to a pattern ID. Nothing is "just trust me."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stops circular debugging&lt;/strong&gt; — The 3-exchange rule forces strategy pivots. No more looping endlessly on the wrong approach.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next: Debug Bank v2
&lt;/h2&gt;

&lt;p&gt;The roadmap includes runnable Docker scenarios where you can practice debugging L3-L4 problems autonomously. Imagine: a broken voice pipeline, a misconfigured database fallback chain, and a timing race condition — all in a sandbox you can experiment in.&lt;/p&gt;

&lt;p&gt;The goal is to turn Debug Bank from a reference manual into a practice environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Clone or download&lt;/strong&gt; — &lt;code&gt;git clone https://github.com/soleimanmansouri/debug-bank&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Copy CLAUDE.md&lt;/strong&gt; — &lt;code&gt;curl -O https://raw.githubusercontent.com/soleimanmansouri/debug-bank/main/CLAUDE.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read the patterns&lt;/strong&gt; — Skim &lt;code&gt;patterns/&lt;/code&gt; to learn what's available&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;On your first bug&lt;/strong&gt; — Pattern check first, then trajectory protocol&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set up pre-deploy&lt;/strong&gt; — Add &lt;code&gt;bash integrations/pre-deploy-check.sh&lt;/code&gt; to your deploy pipeline&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The full setup guide lives in the repo. You can start using patterns in under 5 minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I'm Open-Sourcing This
&lt;/h2&gt;

&lt;p&gt;I built Debug Bank from months of production debugging across 3 major projects and 100+ real incidents. Every pattern has been verified on actual bugs. Every rule has been tested under pressure.&lt;/p&gt;

&lt;p&gt;But patterns are only as useful as they are shared. A pattern bank gains value when developers from different domains contribute new patterns, challenge existing assumptions, and improve the checklists.&lt;/p&gt;

&lt;p&gt;If you've debugged a production system and found a repeating failure pattern, Debug Bank wants it. Submit a PR with a real example, and the pattern becomes part of the shared knowledge base.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bet
&lt;/h2&gt;

&lt;p&gt;My bet is simple: &lt;strong&gt;If you give your AI agent access to a well-organized pattern bank, the agent will solve 70% of future bugs at step 1 instead of re-investigating every time.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's a 10-100x speedup depending on the complexity. That's fewer wasted tokens, faster fixes, and debugging that feels less like Sisyphus pushing a boulder uphill.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Debug Bank is MIT licensed. Zero dependencies. Just markdown and a bash script.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Start here: &lt;a href="https://github.com/soleimanmansouri/debug-bank" rel="noopener noreferrer"&gt;github.com/soleimanmansouri/debug-bank&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Have a bug that doesn't fit the 22 patterns? Open an issue or PR. Let's grow this together.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
