<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alekhya Bonamukkala</title>
    <description>The latest articles on DEV Community by Alekhya Bonamukkala (@alekhya_bonamukkala_).</description>
    <link>https://dev.to/alekhya_bonamukkala_</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3840547%2F7dea5f5e-5814-43e6-8aa6-95ba5769bb05.png</url>
      <title>DEV Community: Alekhya Bonamukkala</title>
      <link>https://dev.to/alekhya_bonamukkala_</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/alekhya_bonamukkala_"/>
    <language>en</language>
    <item>
      <title>Hindsight caught repeated AST traversal bugs</title>
      <dc:creator>Alekhya Bonamukkala</dc:creator>
      <pubDate>Mon, 23 Mar 2026 18:13:18 +0000</pubDate>
      <link>https://dev.to/alekhya_bonamukkala_/hindsight-caught-repeated-ast-traversal-bugs-4be6</link>
      <guid>https://dev.to/alekhya_bonamukkala_/hindsight-caught-repeated-ast-traversal-bugs-4be6</guid>
      <description>&lt;h1&gt;
  
  
  Hindsight caught repeated AST traversal bugs
&lt;/h1&gt;

&lt;p&gt;“Why is it flagging recursion again?” I checked the logs, and the agent wasn’t looping—it was using Hindsight to call out the same AST traversal mistake I’d made three commits ago.&lt;/p&gt;

&lt;p&gt;Last night I assumed my AST walker was fixed; by morning, Hindsight had surfaced the exact same traversal bug across three different submissions I thought were unrelated.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I actually built
&lt;/h2&gt;

&lt;p&gt;I’ve been working on Codemind, a coding practice platform that doesn’t just run user code—it tries to understand how someone is solving a problem and guide them while they’re doing it.&lt;/p&gt;

&lt;p&gt;At a high level, the system looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A frontend where users write and submit code&lt;/li&gt;
&lt;li&gt;A Python backend that parses and analyzes submissions&lt;/li&gt;
&lt;li&gt;A sandboxed execution layer (Firecracker-style isolation) to safely run code&lt;/li&gt;
&lt;li&gt;An analysis pipeline that walks ASTs and applies rules + taint tracking&lt;/li&gt;
&lt;li&gt;A memory layer powered by Hindsight that stores patterns across attempts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The interesting part isn’t execution—it’s the analysis loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Parse code into an AST&lt;/li&gt;
&lt;li&gt;Traverse it to detect patterns (recursion misuse, unsafe flows, etc.)&lt;/li&gt;
&lt;li&gt;Generate feedback&lt;/li&gt;
&lt;li&gt;Store what happened&lt;/li&gt;
&lt;li&gt;Use that history to influence future feedback&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That last step is where things got weird—in a good way.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem I thought I had
&lt;/h2&gt;

&lt;p&gt;Originally, I thought my biggest challenge would be writing good static analysis rules.&lt;/p&gt;

&lt;p&gt;Things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Detect recursion without base case&lt;/li&gt;
&lt;li&gt;Identify unsafe variable flows (taint analysis)&lt;/li&gt;
&lt;li&gt;Catch incorrect loop termination&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So I built a fairly standard AST walker. Something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;visit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FunctionDef&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;check_recursion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;child&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iter_child_nodes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;visit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;child&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pretty normal. Walk the tree, inspect nodes, apply rules.&lt;/p&gt;

&lt;p&gt;And it mostly worked.&lt;/p&gt;

&lt;p&gt;But then I started noticing something annoying:&lt;br&gt;
I kept fixing the same class of bugs over and over again.&lt;/p&gt;

&lt;p&gt;Not in the codebase—in the &lt;em&gt;submissions&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Users (including me, while testing) would:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Write a recursive function&lt;/li&gt;
&lt;li&gt;Forget the base condition&lt;/li&gt;
&lt;li&gt;Or place it incorrectly&lt;/li&gt;
&lt;li&gt;Or structure traversal in a subtly broken way&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And every time, the analyzer would treat it as a fresh problem.&lt;/p&gt;

&lt;p&gt;Stateless. No memory.&lt;/p&gt;

&lt;p&gt;That’s when I added Hindsight.&lt;/p&gt;


&lt;h2&gt;
  
  
  What I expected Hindsight to do
&lt;/h2&gt;

&lt;p&gt;I expected Hindsight to act like a simple log:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Store past attempts&lt;/li&gt;
&lt;li&gt;Maybe surface similar ones&lt;/li&gt;
&lt;li&gt;Help generate slightly better hints&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Basically, better context.&lt;/p&gt;

&lt;p&gt;Instead, it started behaving like pattern recognition across time.&lt;/p&gt;

&lt;p&gt;I integrated it using the official repo:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/vectorize-io/hindsight" rel="noopener noreferrer"&gt;Hindsight GitHub repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hindsight.vectorize.io/" rel="noopener noreferrer"&gt;Hindsight documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://vectorize.io/features/agent-memory" rel="noopener noreferrer"&gt;agent memory overview&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The idea was simple: every analysis run produces an “event”.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;submission&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;issues&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;detected_issues&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ast_patterns&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;extracted_patterns&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nf"&gt;store_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, before generating feedback:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hindsight&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;similar&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;find_similar_patterns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;current_patterns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;similar&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;adjust_feedback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;similar&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s it. No magic.&lt;/p&gt;

&lt;p&gt;But the behavior changed dramatically.&lt;/p&gt;




&lt;h2&gt;
  
  
  The bug that wouldn’t stay fixed
&lt;/h2&gt;

&lt;p&gt;The first time I noticed something different was with a recursion check.&lt;/p&gt;

&lt;p&gt;My rule looked something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_recursion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func_node&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;calls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;find_recursive_calls&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;calls&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;has_base_case&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func_node&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;report&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Missing base case in recursion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Clean. Straightforward.&lt;/p&gt;

&lt;p&gt;But users kept triggering false positives or weird edge cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Base case present but unreachable&lt;/li&gt;
&lt;li&gt;Base case after recursive call&lt;/li&gt;
&lt;li&gt;Nested conditions that break termination&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Individually, each case looked different.&lt;/p&gt;

&lt;p&gt;Stateless analysis treated them as separate.&lt;/p&gt;

&lt;p&gt;Hindsight didn’t.&lt;/p&gt;




&lt;h2&gt;
  
  
  Hindsight started grouping my mistakes
&lt;/h2&gt;

&lt;p&gt;Once I started storing AST-derived patterns, I began extracting simple features:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;patterns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;has_recursion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;base_case_position&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;after_recursive_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;branching_depth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;compute_depth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nothing fancy. Just structural hints.&lt;/p&gt;

&lt;p&gt;But when I queried history, I started seeing clusters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Same incorrect ordering&lt;/li&gt;
&lt;li&gt;Same missing condition structure&lt;/li&gt;
&lt;li&gt;Same flawed traversal shape&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even across different users.&lt;/p&gt;

&lt;p&gt;That’s when the feedback changed from:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Missing base case”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;to:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Your base case is placed after the recursive call, which matches a previous failing pattern.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s a completely different experience.&lt;/p&gt;




&lt;h2&gt;
  
  
  The moment it clicked
&lt;/h2&gt;

&lt;p&gt;“This node ordering looks familiar.”&lt;/p&gt;

&lt;p&gt;That line came from a debug print I had added while comparing ASTs.&lt;/p&gt;

&lt;p&gt;I realized Hindsight wasn’t just storing failures—it was letting me &lt;em&gt;compare structure over time&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;I added a crude similarity function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;is_similar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;p1&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;has_recursion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;p2&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;has_recursion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt;
        &lt;span class="n"&gt;p1&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;base_case_position&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;p2&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;base_case_position&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And suddenly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Different submissions mapped to the same pattern&lt;/li&gt;
&lt;li&gt;Feedback became consistent&lt;/li&gt;
&lt;li&gt;I stopped chasing individual bugs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I was debugging &lt;em&gt;classes of mistakes&lt;/em&gt;, not instances.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where my design broke
&lt;/h2&gt;

&lt;p&gt;The first version of this system had a big flaw:&lt;/p&gt;

&lt;p&gt;I stored &lt;em&gt;too much raw data&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Entire ASTs, full code blobs, verbose logs.&lt;/p&gt;

&lt;p&gt;It made retrieval slow and noisy.&lt;/p&gt;

&lt;p&gt;Worse, similarity comparisons became meaningless.&lt;/p&gt;

&lt;p&gt;I refactored to store only distilled features:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;patterns&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;extract_patterns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ast_tree&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;issues&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This made two things easier:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Fast comparison&lt;/li&gt;
&lt;li&gt;Clear grouping of mistakes&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Tradeoff: I lost some context.&lt;/p&gt;

&lt;p&gt;But for feedback generation, structure mattered more than raw code.&lt;/p&gt;




&lt;h2&gt;
  
  
  How feedback actually changed
&lt;/h2&gt;

&lt;p&gt;Before Hindsight:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each submission analyzed in isolation&lt;/li&gt;
&lt;li&gt;Generic messages&lt;/li&gt;
&lt;li&gt;No sense of progression&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After Hindsight:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Feedback references past mistakes&lt;/li&gt;
&lt;li&gt;Patterns influence hint priority&lt;/li&gt;
&lt;li&gt;Repeated issues get stricter guidance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Check your recursion logic.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“You’ve repeated a pattern where the base case is evaluated after recursion. Try moving it before the recursive call.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s not just better—it’s &lt;em&gt;targeted&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  One real scenario
&lt;/h2&gt;

&lt;p&gt;A user writes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;factorial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;factorial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Looks fine.&lt;/p&gt;

&lt;p&gt;Then they modify it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;factorial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;factorial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No base case.&lt;/p&gt;

&lt;p&gt;First submission:&lt;br&gt;
→ flagged as missing base case&lt;/p&gt;

&lt;p&gt;Second submission (after hint):&lt;br&gt;
→ same mistake, slightly different structure&lt;/p&gt;

&lt;p&gt;Without memory: same feedback again.&lt;/p&gt;

&lt;p&gt;With Hindsight:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Recognizes repeated pattern&lt;/li&gt;
&lt;li&gt;Escalates feedback&lt;/li&gt;
&lt;li&gt;Suggests exact structural fix&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That escalation turned out to be key.&lt;/p&gt;




&lt;h2&gt;
  
  
  What surprised me most
&lt;/h2&gt;

&lt;p&gt;I didn’t expect Hindsight to help me debug &lt;em&gt;my own analyzer&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;But it did.&lt;/p&gt;

&lt;p&gt;By clustering mistakes, I could see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Where my rules were too broad&lt;/li&gt;
&lt;li&gt;Where they were missing edge cases&lt;/li&gt;
&lt;li&gt;Which patterns kept slipping through&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In a way, user mistakes became test cases.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lessons learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Stateless analysis hides patterns&lt;/strong&gt;&lt;br&gt;
If you only look at one submission at a time, you miss the bigger picture. Most bugs repeat in slightly different forms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Store structure, not raw data&lt;/strong&gt;&lt;br&gt;
AST features worked better than full trees. Smaller, comparable representations made everything easier.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Similarity doesn’t need to be fancy&lt;/strong&gt;&lt;br&gt;
Simple comparisons (position, presence, ordering) were enough to group meaningful patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Feedback quality depends on memory&lt;/strong&gt;&lt;br&gt;
Better rules didn’t improve feedback nearly as much as remembering past mistakes did.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Your system will learn things you didn’t design for&lt;/strong&gt;&lt;br&gt;
I added Hindsight to improve hints. It ended up exposing flaws in my analyzer and changing how I debug.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I’d do differently
&lt;/h2&gt;

&lt;p&gt;If I rebuilt this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I’d design pattern extraction first, not last&lt;/li&gt;
&lt;li&gt;I’d define similarity metrics upfront&lt;/li&gt;
&lt;li&gt;I’d treat memory as a core system, not an add-on&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Right now, Hindsight sits alongside the analyzer.&lt;/p&gt;

&lt;p&gt;It probably belongs inside it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;I went in thinking I was building a better static analyzer.&lt;/p&gt;

&lt;p&gt;What I ended up building was a system that remembers how people fail—and uses that to guide them forward.&lt;/p&gt;

&lt;p&gt;And somewhere along the way, it started catching my own repeated AST traversal bugs before I even noticed them.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>beginners</category>
      <category>devops</category>
      <category>devchallenge</category>
    </item>
  </channel>
</rss>
