<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Geetha K T Geetha</title>
    <description>The latest articles on DEV Community by Geetha K T Geetha (@geetha_ktgeetha_88a6718).</description>
    <link>https://dev.to/geetha_ktgeetha_88a6718</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3837398%2Ff291f311-417b-4a4a-a86d-90c7730fb98d.png</url>
      <title>DEV Community: Geetha K T Geetha</title>
      <link>https://dev.to/geetha_ktgeetha_88a6718</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/geetha_ktgeetha_88a6718"/>
    <language>en</language>
    <item>
      <title>my article</title>
      <dc:creator>Geetha K T Geetha</dc:creator>
      <pubDate>Sat, 21 Mar 2026 18:13:11 +0000</pubDate>
      <link>https://dev.to/geetha_ktgeetha_88a6718/my-article-fh7</link>
      <guid>https://dev.to/geetha_ktgeetha_88a6718/my-article-fh7</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnt08e7d9wvj8adbzpu2x.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnt08e7d9wvj8adbzpu2x.jpeg" alt=" " width="800" height="449"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuccsyvjrzogwukowpp0v.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuccsyvjrzogwukowpp0v.jpeg" alt=" " width="800" height="449"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa0xkuhzhpw1hqxmvx8s1.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa0xkuhzhpw1hqxmvx8s1.jpeg" alt=" " width="800" height="449"&gt;&lt;/a&gt;****# Why My Coding Agent Stopped Repeating Errors with Hindsight&lt;/p&gt;

&lt;p&gt;I thought better prompts would fix my coding assistant—until I watched it confidently repeat the same error across multiple attempts.&lt;/p&gt;

&lt;p&gt;The agent kept suggesting &lt;code&gt;asyncio.run()&lt;/code&gt; inside an already-running event loop. I'd corrected it. It apologized. Ten minutes later, same session, different file—same mistake. I'd written careful system prompt instructions. I'd added examples. None of it stuck. The model had no idea it had just made that exact error, because from its perspective, it hadn't. Every invocation was a blank slate.&lt;/p&gt;

&lt;p&gt;That's the problem I built around: not intelligence, but amnesia.&lt;/p&gt;


&lt;h2&gt;
  
  
  What the System Does
&lt;/h2&gt;

&lt;p&gt;The project is a coding practice mentor—a Python agent that helps developers debug their own code. It doesn't just answer questions. It tracks &lt;em&gt;which&lt;/em&gt; mistakes a user makes repeatedly, stores them, and uses that history to change how it responds over time.&lt;/p&gt;

&lt;p&gt;The high-level flow is straightforward:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User submits broken code and a description of what it should do.&lt;/li&gt;
&lt;li&gt;The agent diagnoses the issue and responds with a fix and explanation.&lt;/li&gt;
&lt;li&gt;That interaction—error type, context, how the user described the problem—gets stored as a memory.&lt;/li&gt;
&lt;li&gt;On the next interaction, the agent recalls relevant past mistakes before generating a response, and adjusts its suggestions accordingly.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The piece that makes this work is &lt;a href="https://github.com/vectorize-io/hindsight" rel="noopener noreferrer"&gt;Hindsight&lt;/a&gt;—a memory system built specifically for AI agents, with a retrieval architecture that goes well beyond simple vector search.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Architecture in Three Layers
&lt;/h2&gt;

&lt;p&gt;The system has three main modules: &lt;code&gt;mentor.py&lt;/code&gt; (the agent loop), &lt;code&gt;memory_store.py&lt;/code&gt; (the Hindsight integration), and &lt;code&gt;session_tracker.py&lt;/code&gt; (user behavior tracking across sessions).&lt;/p&gt;

&lt;p&gt;&lt;code&gt;mentor.py&lt;/code&gt; is the orchestrator. It takes a user submission, calls &lt;code&gt;memory_store.recall()&lt;/code&gt; to pull relevant past mistakes, builds a prompt that includes that history, calls the LLM, and then calls &lt;code&gt;memory_store.retain()&lt;/code&gt; to commit the new interaction to memory. The loop is intentionally simple—the complexity lives in the memory layer, not the agent logic.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;session_tracker.py&lt;/code&gt; maintains a lightweight record of per-user session behavior: which error categories appeared, how many times, and whether the user accepted or revised the agent's suggestion. This feeds into what gets retained and how it gets tagged.&lt;/p&gt;

&lt;p&gt;The interesting engineering is entirely in how memory flows through Hindsight's three primitives: &lt;strong&gt;retain&lt;/strong&gt;, &lt;strong&gt;recall&lt;/strong&gt;, and &lt;strong&gt;reflect&lt;/strong&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Memory Layer: Why Simple Storage Fails
&lt;/h2&gt;

&lt;p&gt;My first pass at the memory system was embarrassingly naive. I stored each error event as a JSON blob in a SQLite table with a &lt;code&gt;user_id&lt;/code&gt;, &lt;code&gt;error_type&lt;/code&gt;, &lt;code&gt;timestamp&lt;/code&gt;, and &lt;code&gt;raw_description&lt;/code&gt; column. On each request, I pulled the last 10 rows for that user and shoved them into the context window.&lt;/p&gt;

&lt;p&gt;This worked for about a day of testing before it broke down in three ways.&lt;/p&gt;

&lt;p&gt;First, the context got noisy fast. Ten raw error records—each with its own slightly different phrasing of the same underlying mistake—didn't help the model reason about patterns. It just created redundant signal.&lt;/p&gt;

&lt;p&gt;Second, retrieval was purely recency-based. If a user had made a specific mistake three weeks ago, it wouldn't surface even if the current problem was nearly identical. Recent but irrelevant errors crowded out older but highly relevant ones.&lt;/p&gt;

&lt;p&gt;Third, there was no consolidation. The model couldn't tell that "forgot to await coroutine," "missing await keyword," and "coroutine object is not awaitable" were the same conceptual mistake from three different sessions.&lt;/p&gt;

&lt;p&gt;Hindsight's &lt;a href="https://hindsight.vectorize.io/" rel="noopener noreferrer"&gt;observation consolidation&lt;/a&gt; is what solved this. When you &lt;code&gt;retain()&lt;/code&gt; a new fact, Hindsight doesn't just store it—it analyzes it against existing memories and synthesizes &lt;strong&gt;observations&lt;/strong&gt;: higher-level abstractions that capture patterns across individual facts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# memory_store.py
&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;hindsight&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HindsightClient&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HindsightClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;HINDSIGHT_API_KEY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retain_mistake&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;error_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resolution&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;bank_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error type: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;error_type&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Description: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Resolution: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;resolution&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error_category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;error_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After a few sessions, Hindsight consolidates individual &lt;code&gt;retain()&lt;/code&gt; calls about async errors into an observation like: &lt;em&gt;"This user consistently misuses asyncio—specifically, they attempt to call &lt;code&gt;asyncio.run()&lt;/code&gt; in contexts where an event loop is already running. This has appeared 4 times across 3 sessions."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That observation is what gets surfaced during &lt;code&gt;recall()&lt;/code&gt;. Not four separate raw facts—one synthesized, evidence-backed insight that the agent can actually reason about.&lt;/p&gt;




&lt;h2&gt;
  
  
  TEMPR Retrieval: Not Just Semantic Search
&lt;/h2&gt;

&lt;p&gt;The second thing I got wrong early was treating retrieval as a semantic similarity problem. My SQLite approach used cosine similarity on embeddings. That's fine for finding conceptually similar text—but it misses a lot.&lt;/p&gt;

&lt;p&gt;Consider the query: &lt;em&gt;"Why does my FastAPI endpoint block?"&lt;/em&gt; A pure semantic search might surface memories about blocking I/O, or about FastAPI generally. But what I actually want is: &lt;em&gt;has this specific user made async mistakes before, and if so, which ones?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Hindsight's &lt;a href="https://hindsight.vectorize.io/" rel="noopener noreferrer"&gt;multi-strategy TEMPR retrieval&lt;/a&gt; runs four strategies in parallel: semantic (conceptual similarity), keyword/BM25 (exact term matching), graph (related entities and indirect connections), and temporal (recency and time-range awareness). The results are fused before being returned.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# memory_store.py
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;recall_relevant_history&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;current_problem&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;bank_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;current_problem&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;

    &lt;span class="n"&gt;history_lines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;history_lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (relevance: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history_lines&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The graph retrieval is the part that surprised me most. Hindsight maintains entity relationships across retained facts—so if the system knows "user X has an async mistake" and "async mistakes often relate to event loop misuse," it can surface that connection even without an exact semantic match to the current query. This matters for a coding mentor because mistakes often cluster: a user who misuses &lt;code&gt;asyncio.run()&lt;/code&gt; probably also struggles with &lt;code&gt;await&lt;/code&gt; placement, and both should surface together.&lt;/p&gt;

&lt;p&gt;The temporal dimension matters too. Error patterns from last week are more relevant than errors from two months ago—not because the older ones are wrong, but because recency signals active struggle versus resolved understanding. The retrieval layer weights this without me having to build it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Building the Prompt with Memory
&lt;/h2&gt;

&lt;p&gt;Once recall returns results, the agent loop injects them into the system prompt before calling the LLM.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# mentor.py
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;problem_description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;memory_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;system&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a coding mentor helping a developer debug their code. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You have access to this user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s past mistakes and patterns. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Use this history to avoid repeating suggestions that didn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t work, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;and to flag patterns you&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ve seen before.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;memory_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;system&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s past error patterns:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;memory_context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;system&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Be direct about what the problem is. Reference past patterns if relevant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Code:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;```
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endraw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
python&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_code&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
```&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Problem: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;problem_description&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key design decision here is that the memory context is injected into the &lt;strong&gt;system prompt&lt;/strong&gt;, not the user turn. I tried it in the user turn first—the model treated it as part of the question rather than as background context it should reason from. Moving it to the system prompt made a concrete difference in how the agent weighted that history.&lt;/p&gt;

&lt;p&gt;After the LLM responds, the interaction is retained:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# mentor.py
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_mentor_turn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;memory_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;recall_relevant_history&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;memory_context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

    &lt;span class="n"&gt;error_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;classify_error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# simple heuristic classifier
&lt;/span&gt;    &lt;span class="nf"&gt;retain_mistake&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;error_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;classify_error()&lt;/code&gt; is a lightweight function—currently just keyword matching against a set of error categories (async, type errors, scope issues, import errors, etc.). It's intentionally dumb because the sophistication lives in Hindsight's consolidation, not in my classification logic. I don't need a perfect categorization upfront; I need enough signal for Hindsight to consolidate correctly over time.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Changes After Memory Accumulates
&lt;/h2&gt;

&lt;p&gt;The behavioral difference is clearest after three or four sessions with the same user.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before memory:&lt;/strong&gt; A user submits code with a missing &lt;code&gt;await&lt;/code&gt;. The agent explains the issue generically—here's what &lt;code&gt;await&lt;/code&gt; does, here's the fix.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After four sessions of async mistakes:&lt;/strong&gt; The recall returns an observation that Hindsight has synthesized: &lt;em&gt;"This user has repeatedly made async-related mistakes across 4 sessions, specifically around event loop management and missing await keywords."&lt;/em&gt; The agent's response changes tone. Instead of explaining &lt;code&gt;await&lt;/code&gt; from scratch, it flags the pattern: &lt;em&gt;"This is the fourth time we've seen an async issue in your code. I want to highlight the underlying mental model here, not just fix this specific instance..."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That's not prompt engineering. That's the agent reasoning from actual accumulated evidence about a specific user.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://vectorize.io/features/agent-memory" rel="noopener noreferrer"&gt;agent memory architecture on Vectorize&lt;/a&gt; describes this well: the goal is not just storage but continuous refinement—observations evolve as new evidence arrives, so the agent's model of a user sharpens over time rather than staying static.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Raw fact storage doesn't scale.&lt;/strong&gt; Dumping interaction records into a database and retrieving the last N is not memory—it's a log. Memory requires synthesis, and building your own synthesis layer is a significant amount of work. Hindsight's observation consolidation did in one API call what would have taken me weeks to approximate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Retrieval strategy matters more than storage strategy.&lt;/strong&gt; I spent too long thinking about how to structure what I stored, and not enough thinking about how it would be retrieved. The gap between "semantically similar" and "actually relevant given this user's specific history" is large. Multi-strategy retrieval closes that gap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;System prompt placement for memory context is non-trivial.&lt;/strong&gt; Where you inject retrieved memories in the prompt affects how the model reasons about them. System prompt injection signals background context; user turn injection signals question content. They produce different behaviors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Classification can be shallow if consolidation is deep.&lt;/strong&gt; I was worried my simple &lt;code&gt;classify_error()&lt;/code&gt; heuristic would produce noisy data. In practice, Hindsight's consolidation smooths over the noise—if three slightly different descriptions all point at the same underlying mistake, the observation captures the pattern regardless of how they were labeled on the way in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The memory bank per user model is the right abstraction.&lt;/strong&gt; Giving each user their own bank—&lt;code&gt;bank_id=f"user-{user_id}"&lt;/code&gt;—meant I got isolation for free. No user's history bleeds into another's retrieval. The &lt;a href="https://hindsight.vectorize.io/" rel="noopener noreferrer"&gt;Hindsight documentation&lt;/a&gt; covers memory banks in detail, and it's worth reading before you start designing your retrieval logic.&lt;/p&gt;




&lt;p&gt;The system isn't finished. Error classification needs to be smarter. The session tracker's behavioral signals (did the user revise the agent's answer? did they submit the same error again 10 minutes later?) aren't fully wired into what gets retained yet. There's more to build.&lt;/p&gt;

&lt;p&gt;But the agent no longer repeats the same mistake to the same user twice. That was the goal. It's working.&lt;br&gt;
&lt;a href="https://github.com/geethaktgeethakt51-cmd/my-project/tree/main" rel="noopener noreferrer"&gt;https://github.com/geethaktgeethakt51-cmd/my-project/tree/main&lt;/a&gt;&lt;/p&gt;

</description>
      <category>myarticle</category>
    </item>
  </channel>
</rss>
