<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: srikar</title>
    <description>The latest articles on DEV Community by srikar (@srikar_43eae3034c49ebce90).</description>
    <link>https://dev.to/srikar_43eae3034c49ebce90</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3837442%2F0f98b6b4-52d1-46a4-b304-4d453e65aa78.png</url>
      <title>DEV Community: srikar</title>
      <link>https://dev.to/srikar_43eae3034c49ebce90</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/srikar_43eae3034c49ebce90"/>
    <language>en</language>
    <item>
      <title>Lynt code.learn.improve.repeat</title>
      <dc:creator>srikar</dc:creator>
      <pubDate>Sat, 21 Mar 2026 19:56:14 +0000</pubDate>
      <link>https://dev.to/srikar_43eae3034c49ebce90/lynt-codelearnimproverepeat-406k</link>
      <guid>https://dev.to/srikar_43eae3034c49ebce90/lynt-codelearnimproverepeat-406k</guid>
      <description>&lt;h1&gt;
  
  
  I Didn’t Need a Smarter Tutor. I Needed One That Could Remember Why You Failed Last Time.
&lt;/h1&gt;

&lt;p&gt;Most coding tutors are stateless in the exact place they shouldn’t be. They can tell you that your current submission is wrong, but they forget that you made the same boundary mistake on the previous two attempts.&lt;/p&gt;

&lt;p&gt;That was the problem I ended up solving in this project. The repository I started from is a small FastAPI backend, but the interesting part sits just behind it: a lightweight analysis engine that turns raw code submissions into repeated-pattern detection, mentor-style suggestions, and a tiny hindsight memory layer. The whole thing is much simpler than the average “agent” demo, and that’s why I like it. You can read it in one sitting and understand where it helps, where it cheats, and where it will break.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the system actually does
&lt;/h2&gt;

&lt;p&gt;At the API layer, this project is straightforward. &lt;code&gt;backend/app/main.py&lt;/code&gt; wires up four routes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;POST /run-code&lt;/code&gt; executes Python, JavaScript, Java, or C++ in a temporary directory&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;POST /submit-code&lt;/code&gt; sends a judged submission plus user history into the analysis engine&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;GET /get-history/{user_id}&lt;/code&gt; returns stored submissions&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;GET /get-suggestions/{user_id}&lt;/code&gt; returns the latest suggestions and learning path&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The backend-side flow is almost comically thin:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@router.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/submit-code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;submit_code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;submission&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SubmissionRequest&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_user_history&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;submission&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ai_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_code_analysis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;submission&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;stored_submission&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;submission&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ai_response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;stored_submission&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ai_response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;save_submission&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;submission&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stored_submission&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ai_response&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;history = get_user_history(...)&lt;/code&gt; line is the entire reason the system is interesting. Without it, this is just another one-shot evaluator. With it, the engine can stop treating mistakes as isolated events and start treating them as a behavior.&lt;/p&gt;

&lt;p&gt;The architecture is basically this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart LR
    A["/run-code"] --&amp;gt; B["temporary execution sandbox"]
    C["/submit-code"] --&amp;gt; D["per-user history store"]
    D --&amp;gt; E["ai_engine.process_submission"]
    E --&amp;gt; F["mistake classifier"]
    E --&amp;gt; G["pattern detector"]
    E --&amp;gt; H["hindsight memory"]
    G --&amp;gt; I["suggestions + learning path"]
    H --&amp;gt; J["current insight + past similar insights"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The code runner in &lt;code&gt;backend/app/routes/run_code.py&lt;/code&gt; is practical rather than fancy. It writes the source into a temp folder, shells out to the language runtime or compiler, and kills anything that runs longer than three seconds. It’s exactly the kind of thing I’d build first for an internal product: enough isolation to be useful, nowhere near enough isolation to call “secure.”&lt;/p&gt;

&lt;h2&gt;
  
  
  The story here is really about memory
&lt;/h2&gt;

&lt;p&gt;The most opinionated design choice in this codebase is that feedback isn’t generated directly from the latest submission. It’s generated from the latest submission plus accumulated history, and then turned into a hindsight-style insight object.&lt;/p&gt;

&lt;p&gt;The engine orchestration in &lt;code&gt;../Lynt/ai_engine/engine.py&lt;/code&gt; makes that clear:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_submission&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;submission&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;normalized_submission&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;normalize_submission&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;submission&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;normalized_history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_normalize_history&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;classified_submission&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_classify_submission&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;normalized_submission&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;submissions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;normalized_history&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;classified_submission&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;patterns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;detect_patterns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;submissions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;suggestions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_suggestions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;patterns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;learning_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_learning_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;patterns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;insight_memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_INSIGHT_MEMORY_CACHE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;new_history_submissions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_get_new_history_submissions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;normalized_history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;insight_memory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;insight_memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_build_insight_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;insight_memory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;new_history_submissions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;current_insight&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_resolve_submission_insight&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;classified_submission&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;past_similar_insights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_relevant_insights&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;classified_submission&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;insight_memory&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s the whole pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Normalize the new event.&lt;/li&gt;
&lt;li&gt;Classify the mistake.&lt;/li&gt;
&lt;li&gt;Fold it into historical pattern detection.&lt;/li&gt;
&lt;li&gt;Resolve a current “insight.”&lt;/li&gt;
&lt;li&gt;Retrieve related past insights from memory.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I like this shape because it keeps the judgment steps separate. &lt;code&gt;mistake_classifier.py&lt;/code&gt; handles coarse labels like &lt;code&gt;off_by_one&lt;/code&gt; and &lt;code&gt;logic_error&lt;/code&gt;. &lt;code&gt;pattern_detector.py&lt;/code&gt; promotes repeated &lt;code&gt;(topic, mistake)&lt;/code&gt; pairs into weak areas after three occurrences. &lt;code&gt;suggestion_generator.py&lt;/code&gt; turns those weak areas into advice. And &lt;code&gt;hindsight_memory.py&lt;/code&gt; stores the reusable lesson.&lt;/p&gt;

&lt;p&gt;That last part is the piece I kept coming back to. If a learner keeps getting array bounds wrong, I don’t want the system to just say “wrong answer” again. I want it to remember the &lt;em&gt;kind&lt;/em&gt; of failure and say something like: you keep missing boundary conditions in arrays, so start there.&lt;/p&gt;

&lt;p&gt;That’s exactly what the hindsight module does:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;INSIGHT_RULES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;off_by_one&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message_template&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You often miss boundary conditions in {topic}.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fix_suggestion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Review edge cases and check index boundaries carefully.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;logic_error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message_template&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You tend to make logic mistakes in {topic}.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fix_suggestion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Walk through sample inputs step by step to verify the logic.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And then it stores only the newest version of an insight for a given topic and mistake pair:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;store_insight&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;insight&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;memory_list&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;existing_insight&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memory_list&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;existing_insight&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mistake_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;insight&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mistake_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;existing_insight&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;insight&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;_get_recency_sort_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;insight&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nf"&gt;_get_recency_sort_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;existing_insight&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;memory_list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;insight&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;memory_list&lt;/span&gt;

    &lt;span class="n"&gt;memory_list&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;insight&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;memory_list&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s a very specific tradeoff. I’m not building a full event log here. I’m building a compressed memory of “what lesson should still matter.”&lt;/p&gt;

&lt;p&gt;If you’ve looked at the &lt;a href="https://github.com/vectorize-io/hindsight" rel="noopener noreferrer"&gt;open source Hindsight memory framework on GitHub&lt;/a&gt;, the &lt;a href="https://hindsight.vectorize.io/" rel="noopener noreferrer"&gt;Hindsight documentation for agent memory design&lt;/a&gt;, or the broader &lt;a href="https://vectorize.io/features/agent-memory" rel="noopener noreferrer"&gt;Vectorize agent memory architecture overview&lt;/a&gt;, the core idea will feel familiar: don’t just store chat history or raw events, store distilled lessons that can change future behavior. This repo doesn’t use the full external package, but &lt;code&gt;ai_engine/hindsight/hindsight_memory.py&lt;/code&gt; is very clearly built in that direction.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I thought would work, and what actually mattered
&lt;/h2&gt;

&lt;p&gt;The naive version of this system is obvious: take the judge result, map it to a mistake type, and return some canned advice. In fact, part of this repo still looks like that on purpose.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;mistake_classifier.py&lt;/code&gt; starts with a tiny mapping:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;RESULT_TO_MISTAKE_TYPE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;runtime_error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;syntax_error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timeout&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inefficient_code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wrong_answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;logic_error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;classify_mistake&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;error_message&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;normalized_error_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;error_message&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;index&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;normalized_error_message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;off_by_one&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;RESULT_TO_MISTAKE_TYPE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On paper, that looks almost too dumb to be useful. In practice, it’s a decent first pass because most learning products don’t need perfect classification to be helpful. They need stable categories that line up with actionable advice.&lt;/p&gt;

&lt;p&gt;Where this got more interesting is in the language-specific branches. Java and C++ have custom analysis paths in &lt;code&gt;engine.py&lt;/code&gt;, because generic &lt;code&gt;runtime_error&lt;/code&gt; was too lossy. A Java &lt;code&gt;NullPointerException&lt;/code&gt; and a missing &lt;code&gt;main&lt;/code&gt; signature are both “runtime-ish” problems, but they teach very different lessons. Same for C++: “segmentation fault” deserves different feedback than “missing include.”&lt;/p&gt;

&lt;p&gt;That’s why the engine now branches like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Java gets checks for &lt;code&gt;NullPointerException&lt;/code&gt;, &lt;code&gt;ArrayIndexOutOfBoundsException&lt;/code&gt;, missing &lt;code&gt;main&lt;/code&gt;, and syntax heuristics&lt;/li&gt;
&lt;li&gt;C++ gets checks for segmentation faults, out-of-range access, missing headers, and syntax heuristics&lt;/li&gt;
&lt;li&gt;Everything else falls back to the generic classifier&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s a good example of a design that got more opinionated over time instead of more abstract. I didn’t need a grand unified analysis model. I needed a few language-specific escape hatches where the generic labels were clearly not good enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before and after: when memory actually changes the output
&lt;/h2&gt;

&lt;p&gt;The included test script in &lt;code&gt;../Lynt/ai_engine/test_ai_engine.py&lt;/code&gt; is useful because it shows the system behaving differently once history accumulates.&lt;/p&gt;

&lt;p&gt;With a sample history containing repeated array mistakes, a new wrong-answer submission with an index-related error produces this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mistake_type: off_by_one
patterns: [{'topic': 'arrays', 'mistake': 'off_by_one', 'count': 3}]
suggestions: ['You frequently make boundary errors in arrays. Focus on edge cases and double-check your start and end positions.']
learning_path: ['arrays']
insight: {
  'mistake_type': 'off_by_one',
  'topic': 'arrays',
  'insight_message': 'You often miss boundary conditions in arrays.',
  'fix_suggestion': 'Review edge cases and check index boundaries carefully.'
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s the behavioral jump I cared about.&lt;/p&gt;

&lt;p&gt;Before memory, the system can tell me “this attempt looks like an off-by-one.”&lt;br&gt;&lt;br&gt;
After memory, it can tell me “you keep making boundary mistakes in arrays, and that pattern is now strong enough that I’m going to prioritize arrays in your learning path.”&lt;/p&gt;

&lt;p&gt;That’s a much more credible tutoring loop.&lt;/p&gt;

&lt;p&gt;I also verified one practical edge case that the repo doesn’t hide very well: the hindsight cache is global process state.&lt;/p&gt;

&lt;p&gt;The backend history store is per-user:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;user_submissions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_user_history&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;user_submissions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;save_submission&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;submission&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_submissions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setdefault&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
    &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;submission&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But the engine memory is not. &lt;code&gt;engine.py&lt;/code&gt; keeps these at module scope:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;_INSIGHT_MEMORY_CACHE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;_PROCESSED_HISTORY_KEYS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;SubmissionKey&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That means if user A creates an &lt;code&gt;off_by_one&lt;/code&gt; insight for arrays, user B can see a “past similar insight” even with an empty personal history, as long as they trigger the same topic and mistake shape. I reproduced exactly that.&lt;/p&gt;

&lt;p&gt;For a demo, this is fine. For a real tutoring system, it’s a bug.&lt;/p&gt;

&lt;p&gt;And honestly, that bug tells the real story of this codebase better than any polished diagram could. Building “memory” is easy if you mean “append some stuff to a list.” Building memory that is scoped correctly, persisted, deduplicated, queryable, and safe across users is where the real engineering starts.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part I’d change first
&lt;/h2&gt;

&lt;p&gt;There’s even a comment in &lt;code&gt;engine.py&lt;/code&gt; pointing in the right direction:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In a production system, this local filtering would be replaced by a database query against a persistent memory store such as MongoDB or a vector database.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is the right next step. The current backend has a placeholder &lt;code&gt;backend/app/config/db.py&lt;/code&gt; for future persistence, but right now both layers are demo-grade:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;user history disappears on process restart&lt;/li&gt;
&lt;li&gt;hindsight memory is process-wide, not per user&lt;/li&gt;
&lt;li&gt;retrieval is simple filtering by topic or mistake type&lt;/li&gt;
&lt;li&gt;there’s no notion of privacy, tenancy, or long-term memory aging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If I were taking this past demo territory, I’d keep the current shape of the engine and replace the storage model underneath it. The distilled insight object is the part worth preserving. The in-memory container is not.&lt;/p&gt;

&lt;p&gt;That’s also where a production-grade hindsight system would make sense. The thing I want to persist is not raw submission blobs forever. It’s a curated memory record: topic, mistake type, lesson, timestamp, user scope, and maybe a confidence or decay strategy.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned building it
&lt;/h2&gt;

&lt;p&gt;The main lesson here is that memory is only useful when it changes behavior. Storing history is trivial. Turning history into a reusable lesson is the actual product.&lt;/p&gt;

&lt;p&gt;A few takeaways I’d reuse:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simple mistake taxonomies are better than clever ones. &lt;code&gt;off_by_one&lt;/code&gt;, &lt;code&gt;logic_error&lt;/code&gt;, &lt;code&gt;memory_error&lt;/code&gt;, and &lt;code&gt;null_reference&lt;/code&gt; are crude, but they map cleanly to advice.&lt;/li&gt;
&lt;li&gt;Language-specific escape hatches are worth it. The Java and C++ branches in &lt;code&gt;engine.py&lt;/code&gt; are more maintainable than pretending one generic classifier can explain every runtime failure.&lt;/li&gt;
&lt;li&gt;Distilled memory beats raw logs for tutoring. The &lt;code&gt;insight&lt;/code&gt; object is more reusable than a pile of past error strings.&lt;/li&gt;
&lt;li&gt;Global in-process memory is a trap. It feels convenient until you realize you’ve built cross-user leakage and restart amnesia into the system.&lt;/li&gt;
&lt;li&gt;Pattern thresholds matter. Requiring three repeated failures before declaring a weak area in &lt;code&gt;pattern_detector.py&lt;/code&gt; is a small but good guardrail against overreacting to noise.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What I like about this repo is that it doesn’t pretend to be more sophisticated than it is. It’s a thin backend, a practical code runner, and an analysis engine trying to answer a very specific question: how do I make feedback feel cumulative instead of stateless?&lt;/p&gt;

&lt;p&gt;For this kind of system, that’s the right question. Everything else is implementation detail.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fudvmn1csuf4hr4ho8vyv.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fudvmn1csuf4hr4ho8vyv.jpeg" alt=" " width="800" height="378"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>backend</category>
      <category>learning</category>
      <category>python</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
