<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: PARDHU CHIPINAPI</title>
    <description>The latest articles on DEV Community by PARDHU CHIPINAPI (@vsaipardhu).</description>
    <link>https://dev.to/vsaipardhu</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3840557%2F700df3d7-f817-44a4-a991-6913c39ea8e2.png</url>
      <title>DEV Community: PARDHU CHIPINAPI</title>
      <link>https://dev.to/vsaipardhu</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/vsaipardhu"/>
    <language>en</language>
    <item>
      <title>I Built Adaptive Hints Using Hindsight</title>
      <dc:creator>PARDHU CHIPINAPI</dc:creator>
      <pubDate>Mon, 23 Mar 2026 17:51:53 +0000</pubDate>
      <link>https://dev.to/vsaipardhu/i-built-adaptive-hints-using-hindsight-26md</link>
      <guid>https://dev.to/vsaipardhu/i-built-adaptive-hints-using-hindsight-26md</guid>
      <description>&lt;h1&gt;
  
  
  I Built Adaptive Hints Using Hindsight
&lt;/h1&gt;

&lt;p&gt;“Why is it suddenly explaining recursion?” I hadn’t changed the prompt—Hindsight had quietly picked up the user’s past mistakes and rewired the hint mid-session.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I actually built
&lt;/h2&gt;

&lt;p&gt;I’ve been working on Codemind, a coding practice platform that tries to behave less like a judge and more like a mentor. Instead of just saying “wrong answer,” it watches &lt;em&gt;how&lt;/em&gt; you fail and adjusts guidance over time.&lt;/p&gt;

&lt;p&gt;At a high level, the system looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Frontend&lt;/strong&gt;: users solve problems, submit code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend (Python)&lt;/strong&gt;: handles submissions and feedback orchestration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution layer&lt;/strong&gt;: runs code safely (Firecracker-style sandbox)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI layer&lt;/strong&gt;: analyzes code and generates hints&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory layer (Hindsight)&lt;/strong&gt;: stores attempts and extracts patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The interesting part isn’t the execution or the model—it’s the memory. I wired in &lt;a href="https://github.com/vectorize-io/hindsight" rel="noopener noreferrer"&gt;Hindsight GitHub repository&lt;/a&gt; to track user attempts over time and use that history to shape hints.&lt;/p&gt;

&lt;p&gt;If you’ve never used it, their &lt;a href="https://hindsight.vectorize.io/" rel="noopener noreferrer"&gt;Hindsight documentation&lt;/a&gt; explains the idea well: instead of treating each interaction as isolated, you store and retrieve past events to influence future behavior. It’s basically a structured memory layer for agents.&lt;/p&gt;

&lt;p&gt;That sounds obvious. It’s not.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem: stateless feedback is useless
&lt;/h2&gt;

&lt;p&gt;Originally, my feedback loop looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_submission&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;problem_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;problem_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;passed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Correct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;hint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_hint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;problem_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;hint&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every submission was evaluated independently. Same wrong logic? Same hint.&lt;/p&gt;

&lt;p&gt;This works fine until you watch a real user:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Attempt 1 → wrong base case&lt;/li&gt;
&lt;li&gt;Attempt 2 → still wrong base case&lt;/li&gt;
&lt;li&gt;Attempt 3 → copy-paste variation, same mistake&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the system keeps saying:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Check your recursion logic.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s not helpful. That’s just noise.&lt;/p&gt;

&lt;p&gt;What I needed was:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“You’ve made this mistake &lt;em&gt;three times&lt;/em&gt;. Let’s change strategy.”&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  My first attempt at “memory” (and why it failed)
&lt;/h2&gt;

&lt;p&gt;My initial solution was embarrassingly naive: just store past submissions in a database and pass them into the prompt.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_user_attempts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;problem_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
User history:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Current code:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Give a hint.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This technically worked. But in practice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prompts became huge&lt;/li&gt;
&lt;li&gt;The model ignored most of the history&lt;/li&gt;
&lt;li&gt;No consistent behavior across attempts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It wasn’t &lt;em&gt;structured&lt;/em&gt;. It was just dumping text and hoping for magic.&lt;/p&gt;




&lt;h2&gt;
  
  
  What changed: introducing Hindsight
&lt;/h2&gt;

&lt;p&gt;Instead of shoving raw history into prompts, I switched to using Hindsight as a proper memory layer.&lt;/p&gt;

&lt;p&gt;The core idea: store &lt;strong&gt;events&lt;/strong&gt;, not blobs of text.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;hindsight&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;submission&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;problem_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;problem_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;classify_error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;passed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;passed&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now every submission becomes a structured record:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which problem&lt;/li&gt;
&lt;li&gt;What kind of mistake&lt;/li&gt;
&lt;li&gt;Whether it passed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where things got interesting.&lt;/p&gt;

&lt;p&gt;Instead of asking “what did the user do?”, I could ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“What patterns are emerging?”&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Turning events into adaptive hints
&lt;/h2&gt;

&lt;p&gt;Once events were stored, I started querying them before generating hints.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;patterns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hindsight&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;problem_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;problem_id&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;aggregation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error_frequency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then I used those patterns to decide &lt;em&gt;how&lt;/em&gt; to hint.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_adaptive_hint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;patterns&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;patterns&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recursion_error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;explain_recursion_basics&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;patterns&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;off_by_one&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;highlight_edge_cases&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;generic_hint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is simple logic. No fancy ML. But it changed behavior dramatically.&lt;/p&gt;

&lt;p&gt;The system stopped reacting to &lt;em&gt;one&lt;/em&gt; submission and started reacting to &lt;em&gt;trends&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The unexpected part: sequence mattered more than frequency
&lt;/h2&gt;

&lt;p&gt;At first, I thought counting errors was enough.&lt;/p&gt;

&lt;p&gt;It wasn’t.&lt;/p&gt;

&lt;p&gt;Two identical mistakes in a row means something very different from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;mistake → fix → new mistake&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So I added sequence awareness.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;recent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hindsight&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_recent_events&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recursion_error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;recent&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;deep_recursion_hint&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s when the system started feeling… intentional.&lt;/p&gt;

&lt;p&gt;Not just “you’re wrong,” but:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“You’re stuck in a loop. Let’s break it.”&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What the system does now (concretely)
&lt;/h2&gt;

&lt;p&gt;Here’s a real scenario:&lt;/p&gt;

&lt;h3&gt;
  
  
  Before (stateless)
&lt;/h3&gt;

&lt;p&gt;User submits broken recursion:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;“Check your recursion logic.”&lt;/li&gt;
&lt;li&gt;“Check your recursion logic.”&lt;/li&gt;
&lt;li&gt;“Check your recursion logic.”&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Nothing changes.&lt;/p&gt;




&lt;h3&gt;
  
  
  After (with Hindsight)
&lt;/h3&gt;

&lt;p&gt;Same user:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;“Check your recursion logic.”&lt;/li&gt;
&lt;li&gt;“Your base case might be incorrect.”&lt;/li&gt;
&lt;li&gt;“Let’s walk through a simple recursion example step by step.”&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The system escalates guidance based on &lt;em&gt;observed struggle&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;No prompt changes. No retraining. Just memory.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where this shows up in the code
&lt;/h2&gt;

&lt;p&gt;The architecture ended up separating concerns pretty cleanly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;submission_handler.py&lt;/code&gt; → orchestrates flow&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;execution_engine.py&lt;/code&gt; → runs code safely&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;hindsight_client.py&lt;/code&gt; → logs + queries memory&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;hint_engine.py&lt;/code&gt; → decides hint strategy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key boundary is here:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;patterns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hindsight_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_patterns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;problem_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;hint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hint_engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;patterns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That separation matters.&lt;/p&gt;

&lt;p&gt;Hindsight doesn’t generate hints. It just tells you what’s been happening.&lt;/p&gt;

&lt;p&gt;The hint engine decides what to do with that information.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I didn’t let the model “figure it out”
&lt;/h2&gt;

&lt;p&gt;It’s tempting to push everything into the LLM:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Here’s history, figure out what to say.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I tried that. It’s inconsistent.&lt;/p&gt;

&lt;p&gt;Instead, I used Hindsight for &lt;strong&gt;state&lt;/strong&gt;, and kept &lt;strong&gt;decision logic deterministic&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This gave me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Predictable behavior&lt;/li&gt;
&lt;li&gt;Easier debugging&lt;/li&gt;
&lt;li&gt;Lower token usage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And honestly, it feels more like engineering than prompting.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I’d do differently
&lt;/h2&gt;

&lt;p&gt;There are still rough edges.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Error classification is brittle
&lt;/h3&gt;

&lt;p&gt;Right now, I’m using simple heuristics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;classify_error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recursion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recursion_error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is obviously weak. A better approach would use AST analysis or execution traces.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Patterns are too rigid
&lt;/h3&gt;

&lt;p&gt;Threshold-based logic (&lt;code&gt;&amp;gt;= 2&lt;/code&gt;) works, but it’s crude.&lt;/p&gt;

&lt;p&gt;Some users need help faster. Others don’t.&lt;/p&gt;

&lt;p&gt;I’d like to make this adaptive per user.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. No cross-problem learning yet
&lt;/h3&gt;

&lt;p&gt;Hindsight stores everything, but I’m only querying per problem.&lt;/p&gt;

&lt;p&gt;Missed opportunity.&lt;/p&gt;

&lt;p&gt;If someone struggles with recursion in one problem, I should anticipate it in the next.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Hindsight worked here
&lt;/h2&gt;

&lt;p&gt;The key thing Hindsight gave me wasn’t storage—it was &lt;em&gt;structure&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Instead of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;dumping logs&lt;/li&gt;
&lt;li&gt;embedding conversations&lt;/li&gt;
&lt;li&gt;hoping retrieval works&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I had:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;typed events&lt;/li&gt;
&lt;li&gt;queryable patterns&lt;/li&gt;
&lt;li&gt;explicit control over behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re curious, the idea is similar to what’s described on the &lt;a href="https://vectorize.io/features/agent-memory" rel="noopener noreferrer"&gt;agent memory page on Vectorize&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It’s not about making the model smarter.&lt;/p&gt;

&lt;p&gt;It’s about giving your system &lt;em&gt;memory you can reason about&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lessons I’d keep
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Stateless systems plateau fast&lt;/strong&gt;&lt;br&gt;
If behavior doesn’t change across attempts, users stop learning.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Memory should be structured, not appended&lt;/strong&gt;&lt;br&gt;
Events &amp;gt; raw text history.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sequence beats frequency&lt;/strong&gt;&lt;br&gt;
Repeated mistakes in a row signal “stuck,” not just “wrong.”&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Keep decision logic outside the model&lt;/strong&gt;&lt;br&gt;
Deterministic systems are easier to debug and improve.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Start simple, but make it observable&lt;/strong&gt;&lt;br&gt;
Logging patterns mattered more than clever algorithms.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Closing thought
&lt;/h2&gt;

&lt;p&gt;I didn’t set out to build an “adaptive system.” I just wanted to stop repeating the same useless hint.&lt;/p&gt;

&lt;p&gt;Hindsight gave me a way to &lt;em&gt;notice patterns&lt;/em&gt;, and once I had that, adapting behavior became straightforward.&lt;/p&gt;

&lt;p&gt;The surprising part wasn’t that it worked.&lt;/p&gt;

&lt;p&gt;It’s that something this simple made the system feel like it was actually paying attention.&lt;/p&gt;

</description>
      <category>agentmemory</category>
      <category>llmengineering</category>
      <category>developertools</category>
      <category>edtech</category>
    </item>
  </channel>
</rss>
