<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sanjeetha</title>
    <description>The latest articles on DEV Community by Sanjeetha (@sanjeetha_167).</description>
    <link>https://dev.to/sanjeetha_167</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3886787%2Fa5d433c6-ac35-4907-8516-dda2df318b9b.png</url>
      <title>DEV Community: Sanjeetha</title>
      <link>https://dev.to/sanjeetha_167</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sanjeetha_167"/>
    <language>en</language>
    <item>
      <title>Fixing blind spots in code reviews with Hindsight memory</title>
      <dc:creator>Sanjeetha</dc:creator>
      <pubDate>Sun, 19 Apr 2026 03:06:18 +0000</pubDate>
      <link>https://dev.to/sanjeetha_167/fixing-blind-spots-in-code-reviews-with-hindsight-memory-gl2</link>
      <guid>https://dev.to/sanjeetha_167/fixing-blind-spots-in-code-reviews-with-hindsight-memory-gl2</guid>
      <description>&lt;h2&gt;
  
  
  The problem: code review agents don’t really “learn”
&lt;/h2&gt;

&lt;p&gt;Most AI code review tools feel impressive for about 5 minutes.&lt;/p&gt;

&lt;p&gt;They catch syntax issues, suggest formatting improvements, and maybe flag obvious bugs.&lt;/p&gt;

&lt;p&gt;But they don’t remember past mistakes.&lt;/p&gt;

&lt;p&gt;If your codebase keeps repeating the same issue, the agent gives the same generic advice every time.&lt;/p&gt;

&lt;p&gt;We didn’t want a smarter reviewer.&lt;/p&gt;

&lt;p&gt;We wanted a reviewer that gets better over time.&lt;/p&gt;




&lt;h2&gt;
  
  
  What we built
&lt;/h2&gt;

&lt;p&gt;We built a code review agent with persistent memory using an LLM and Hindsight.&lt;/p&gt;

&lt;p&gt;The idea is simple:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Every review should make the next review better.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What pushed us to build this
&lt;/h2&gt;

&lt;p&gt;While building this, we noticed our agent kept missing repeated issues across multiple pull requests.&lt;/p&gt;

&lt;p&gt;For example, it would flag a missing null check in one PR, but completely forget about it in the next one.&lt;br&gt;&lt;br&gt;
Even worse, it kept giving the same generic suggestion without improving.&lt;/p&gt;

&lt;p&gt;That’s when we realized the problem wasn’t intelligence.&lt;/p&gt;

&lt;p&gt;It was memory.&lt;/p&gt;


&lt;h2&gt;
  
  
  System overview
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Developer pushes code
&lt;/li&gt;
&lt;li&gt;Agent reviews code
&lt;/li&gt;
&lt;li&gt;Mistakes are stored in memory
&lt;/li&gt;
&lt;li&gt;Future reviews use that memory
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This turns stateless reviews into evolving feedback.&lt;/p&gt;


&lt;h2&gt;
  
  
  Core idea: feedback loop
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Step 1: Analyze code
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;review_code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;issues&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;analyze_with_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;issues&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Step 2: Store memory
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;store_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;issue&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;hindsight&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retain&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pattern&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;issue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fix&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;issue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fix&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Step 3: Recall past issues
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;hindsight&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Step 4: Improve output
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;enhanced_review&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;past&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;analyze_with_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;past&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Before vs After
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generic feedback
&lt;/li&gt;
&lt;li&gt;Same suggestions repeated
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Personalized feedback
&lt;/li&gt;
&lt;li&gt;Pattern recognition
&lt;/li&gt;
&lt;li&gt;Continuous improvement
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One clear change we observed:&lt;/p&gt;

&lt;p&gt;Earlier, the agent would say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Handle null values properly"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;After adding memory, it started saying:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"You’ve had similar null-check issues in previous PRs. Consider centralizing validation."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That shift made the feedback actually useful.&lt;/p&gt;


&lt;h2&gt;
  
  
  DevOps integration
&lt;/h2&gt;

&lt;p&gt;This agent runs inside a CI pipeline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Triggered on pull requests
&lt;/li&gt;
&lt;li&gt;Reviews code automatically
&lt;/li&gt;
&lt;li&gt;Posts comments
&lt;/li&gt;
&lt;li&gt;Stores learning after each run
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So instead of being a one-time tool, it becomes part of the development workflow.&lt;/p&gt;


&lt;h2&gt;
  
  
  Where things broke (and what we learned)
&lt;/h2&gt;

&lt;p&gt;At one point, we made a mistake.&lt;/p&gt;

&lt;p&gt;We stored &lt;strong&gt;too many low-quality issues&lt;/strong&gt; in memory.&lt;/p&gt;

&lt;p&gt;The result?&lt;/p&gt;

&lt;p&gt;The agent started:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;recalling irrelevant issues
&lt;/li&gt;
&lt;li&gt;giving noisy suggestions
&lt;/li&gt;
&lt;li&gt;becoming less accurate
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It actually got worse.&lt;/p&gt;


&lt;h2&gt;
  
  
  Fixing memory quality
&lt;/h2&gt;

&lt;p&gt;We fixed this by filtering what gets stored:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;issue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;severity&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;store_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;issue&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We also started:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prioritizing recent issues
&lt;/li&gt;
&lt;li&gt;ignoring low-impact suggestions
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This made the memory system much more reliable.&lt;/p&gt;




&lt;h2&gt;
  
  
  What surprised us
&lt;/h2&gt;

&lt;p&gt;We expected better reviews.&lt;/p&gt;

&lt;p&gt;We didn’t expect behavior change.&lt;/p&gt;

&lt;p&gt;The agent started:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;referencing past mistakes
&lt;/li&gt;
&lt;li&gt;recognizing patterns
&lt;/li&gt;
&lt;li&gt;suggesting consistent fixes
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At some point, it stopped feeling like a tool.&lt;/p&gt;

&lt;p&gt;It felt like a junior engineer that learns over time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lessons learned
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Stateless AI has limits
&lt;/li&gt;
&lt;li&gt;Memory is more powerful than prompt tuning
&lt;/li&gt;
&lt;li&gt;Not all feedback should be remembered
&lt;/li&gt;
&lt;li&gt;Feedback loops create real improvement
&lt;/li&gt;
&lt;li&gt;DevOps integration is essential
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;Most AI tools react.&lt;/p&gt;

&lt;p&gt;This one remembers.&lt;/p&gt;

&lt;p&gt;And that changes everything.&lt;/p&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Hindsight GitHub: &lt;a href="https://github.com/vectorize-io/hindsight" rel="noopener noreferrer"&gt;https://github.com/vectorize-io/hindsight&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Docs: &lt;a href="https://hindsight.vectorize.io/" rel="noopener noreferrer"&gt;https://hindsight.vectorize.io/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Agent memory: &lt;a href="https://vectorize.io/what-is-agent-memory" rel="noopener noreferrer"&gt;https://vectorize.io/what-is-agent-memory&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>llm</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
