<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Oscar Rieken</title>
    <description>The latest articles on DEV Community by Oscar Rieken (@orieken).</description>
    <link>https://dev.to/orieken</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F409515%2F4b302a29-5754-4f45-a9ad-a8b955d04751.jpeg</url>
      <title>DEV Community: Oscar Rieken</title>
      <link>https://dev.to/orieken</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/orieken"/>
    <language>en</language>
    <item>
      <title>Prompt engineering for teacher insights with Claude — structured JSON and graceful fallbacks</title>
      <dc:creator>Oscar Rieken</dc:creator>
      <pubDate>Thu, 21 May 2026 04:08:11 +0000</pubDate>
      <link>https://dev.to/orieken/prompt-engineering-for-teacher-insights-with-claude-structured-json-and-graceful-fallbacks-331j</link>
      <guid>https://dev.to/orieken/prompt-engineering-for-teacher-insights-with-claude-structured-json-and-graceful-fallbacks-331j</guid>
      <description>&lt;h2&gt;
  
  
  What We Built
&lt;/h2&gt;

&lt;p&gt;NumPath now generates a teacher insight on demand: click "Generate insight" on any student's panel and the system reads that student's KC mastery states plus their 10 most recent attempts, sends them to Claude, and returns a structured response: an actionable observation (&lt;code&gt;text&lt;/code&gt;), a severity signal (&lt;code&gt;type&lt;/code&gt;: warn/good/info), and a traceable &lt;code&gt;evidence&lt;/code&gt; block pointing to the specific KC, p_mastery value, and mistake count that drove the insight. The &lt;code&gt;evidence&lt;/code&gt; isn't generated by the LLM — it's assembled server-side from DB reads. Teachers get an interpretation layer backed by auditable data.&lt;/p&gt;

&lt;p&gt;The backend piece is &lt;code&gt;GenerateInsightUseCase&lt;/code&gt; — three DB queries, one LLM call, one JSON parse, one server-side evidence assembly. This post is about the schema design, the prompt engineering, and everything that can go wrong with the parse step.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Design Decision
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Structured JSON vs. free-form text
&lt;/h3&gt;

&lt;p&gt;The original &lt;code&gt;insight.txt&lt;/code&gt; prompt asked for a single sentence. Simple, but not structured enough for a UI that needs to display a summary and a suggested action as separate elements.&lt;/p&gt;

&lt;p&gt;We rewrote the prompt — but here's the key design decision: &lt;strong&gt;the LLM only generates two fields&lt;/strong&gt;. The &lt;code&gt;evidence&lt;/code&gt; block is assembled server-side from DB reads before the call, so it's auditable regardless of what the model says.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;You&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;are&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;specialist&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;math&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;learning&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;advisor&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;primary&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;school&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;teachers.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Given&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;student's&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Knowledge&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Component&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;mastery&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;recent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;attempt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;history,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;generate&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;JSON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;response&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;with&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;exactly&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;two&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;fields:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;one&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;actionable&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;sentence&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;(max&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;words)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;citing&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;specific&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;KC&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;or&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;mistake&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;type&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;one&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"warn"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"good"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;or&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"info"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;based&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;on&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;urgency&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Respond&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;with&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;only&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;JSON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;object.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;No&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;explanation,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;no&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;markdown,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;no&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;code&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;fences.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That last line — &lt;strong&gt;"no markdown, no code fences"&lt;/strong&gt; — is the most important one. Without it, Claude wraps the JSON in a &lt;code&gt;&lt;/code&gt;&lt;code&gt;json&lt;/code&gt;&lt;code&gt;&lt;/code&gt; block, and &lt;code&gt;json.loads()&lt;/code&gt; raises &lt;code&gt;JSONDecodeError&lt;/code&gt; on the backtick prefix. Every time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Defensive fallback, never a 500
&lt;/h3&gt;

&lt;p&gt;LLM output can't be trusted unconditionally. We introduced a module-level fallback constant and a pure parse function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;_FALLBACK_INSIGHT_TEXT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;InsightResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Insight temporarily unavailable.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;info&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;evidence&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;InsightEvidence&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_mastery&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mistake_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mistake_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;window&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_parse_llm_fields&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Parse only the LLM-generated fields: text + type.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nf"&gt;except &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JSONDecodeError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;KeyError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;TypeError&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;insight_parse_failed_using_fallback raw=%s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;evidence&lt;/code&gt; block is never parsed from the LLM — it's built from the same DB data that was sent to the model. If parsing fails, the fallback fires. The endpoint always returns 200 with a clean &lt;code&gt;InsightResponse&lt;/code&gt;. The warning log is the signal to watch for prompt drift after model updates.&lt;/p&gt;

&lt;h3&gt;
  
  
  StubProvider updated for the new schema
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;StubProvider&lt;/code&gt; had to be updated to match — the old stub returned a plain string, which would now fail parsing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;StubProvider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Student is consistently skipping the borrowing step — targeted regrouping practice recommended.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
            &lt;span class="sh"&gt;'"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;warn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The stub returns only &lt;code&gt;text&lt;/code&gt; and &lt;code&gt;type&lt;/code&gt; — the &lt;code&gt;evidence&lt;/code&gt; block is assembled by the use case from DB data, not by the provider. All tests run offline with &lt;code&gt;LLM_PROVIDER=stub&lt;/code&gt; — no API key needed, no network dependency, deterministic results. The provider abstraction (&lt;a href="//../adrs/ADR-003-llm-abstraction.md"&gt;ADR-003&lt;/a&gt;) pays off here.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why It Matters for the Research
&lt;/h2&gt;

&lt;p&gt;The MacLellan NTI Framework (2018) requires that every AI insight be explainable and traceable — not just "this student needs more practice" but "this student has &lt;code&gt;p_mastery=0.18&lt;/code&gt; on &lt;code&gt;SUB_BORROW&lt;/code&gt; with &lt;code&gt;BORROW_SKIP&lt;/code&gt; classified in 9 of the last 11 attempts." The &lt;code&gt;evidence&lt;/code&gt; field makes that traceability structural, not aspirational.&lt;/p&gt;

&lt;p&gt;The Teacher-in-the-Loop principle is now fully implemented across three phases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Phase 1&lt;/strong&gt;: KC mastery bars — &lt;em&gt;where&lt;/em&gt; is the student stuck?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Phase 2&lt;/strong&gt;: Attempt history — &lt;em&gt;what exactly did they do wrong?&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Phase 3&lt;/strong&gt;: LLM insight — &lt;em&gt;what should the teacher do about it?&lt;/em&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The distinction between Phase 2 and Phase 3 matters for the RCT. Phase 2 gives teachers raw evidence — they can verify it, disagree with it, or draw their own conclusions. Phase 3 adds an interpretation. The research question becomes: do teachers who see the LLM interpretation intervene differently from those who only see the raw data? We don't know yet. But we'll be able to measure it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Learned
&lt;/h2&gt;

&lt;p&gt;The hardest part wasn't the LLM call — it was making the output reliable enough to drive UI state. A streaming response or a multi-turn conversation would require a different interface entirely (&lt;a href="//../adrs/ADR-003-llm-abstraction.md"&gt;ADR-003&lt;/a&gt; explicitly deferred that). For a single completion call, the pattern is: instruct precisely, parse defensively, fall back gracefully.&lt;/p&gt;

&lt;p&gt;The insight quality degrades predictably for students with few attempts — a student who's done 3 problems gets a much vaguer insight than one who's done 30. That's expected and honest, but it's worth surfacing to teachers: "This insight is based on 3 attempts" would make the confidence level explicit. That's a Phase 4 enhancement.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The LLM provider abstraction supports multiple models — the next natural step is A/B testing Claude Sonnet vs. Haiku on insight quality vs. cost. That's not planned for MVP, but the infrastructure already supports it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;"No markdown, no code fences" is load-bearing prompt text&lt;/strong&gt; — without it, Claude wraps JSON in backtick blocks and &lt;code&gt;json.loads()&lt;/code&gt; fails; always explicitly instruct raw output&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The fallback constant pattern beats exception propagation&lt;/strong&gt; — a named &lt;code&gt;_FALLBACK_INSIGHT&lt;/code&gt; at the module level makes the graceful degradation path explicit, testable, and readable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The StubProvider must match the current schema&lt;/strong&gt; — when you change what the LLM is expected to return, update the stub immediately or you'll have tests that pass against outdated fixtures&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>numpath</category>
      <category>adaptivelearning</category>
      <category>python</category>
      <category>llm</category>
    </item>
    <item>
      <title>Attempt history in the teacher dashboard — the scalar subquery pattern</title>
      <dc:creator>Oscar Rieken</dc:creator>
      <pubDate>Thu, 21 May 2026 04:01:29 +0000</pubDate>
      <link>https://dev.to/orieken/attempt-history-in-the-teacher-dashboard-the-scalar-subquery-pattern-4an</link>
      <guid>https://dev.to/orieken/attempt-history-in-the-teacher-dashboard-the-scalar-subquery-pattern-4an</guid>
      <description>&lt;h2&gt;
  
  
  What We Built
&lt;/h2&gt;

&lt;p&gt;In &lt;a href="//./02-teacher-kc-dashboard.md"&gt;Phase 1 of the KC dashboard&lt;/a&gt; we gave teachers per-skill mastery states for each student — colour-coded bars showing Novice / Developing / Mastered. That answered "where is this student stuck?" Phase 2 answers "what exactly did they do wrong?"&lt;/p&gt;

&lt;p&gt;We added a paginated attempt history table below the KC panel. For the selected student, teachers see every attempt in reverse-chronological order: the problem question, the student's answer, whether it was correct, the timestamp, which Knowledge Component was being practised, and — when the classifier fired — the mistake code (e.g. &lt;code&gt;BORROW_SKIP&lt;/code&gt;, &lt;code&gt;DIGIT_REVERSAL&lt;/code&gt;). The backend is a single endpoint, &lt;code&gt;GET /teacher/students/{id}/attempts&lt;/code&gt;, teacher-only, with OFFSET pagination.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Design Decision
&lt;/h2&gt;

&lt;p&gt;The tricky part was joining &lt;code&gt;MistakeEvent.mistake_code&lt;/code&gt; onto &lt;code&gt;Attempt&lt;/code&gt; rows. Not every attempt produces a mistake event — the classifier only fires when an incorrect answer matches a known pattern. So the join is optional (LEFT JOIN territory), and the relationship is one-to-one in intent but not enforced that way in the schema.&lt;/p&gt;

&lt;p&gt;We considered two approaches:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option A — LEFT JOIN with GROUP BY&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;me&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mistake_code&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;attempts&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;mistake_events&lt;/span&gt; &lt;span class="n"&gt;me&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;me&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attempt_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;OFFSET&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works now. But if a future bug or edge case produces two &lt;code&gt;MistakeEvent&lt;/code&gt; rows for one attempt, every teacher session silently doubles those rows. The bug would be invisible — no error, just wrong data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option B — Scalar correlated subquery&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;mistake_code_subq&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MistakeEvent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mistake_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MistakeEvent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attempt_id&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;Attempt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scalar_subquery&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;stmt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;Attempt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Problem&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Attempt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;answer_given&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Attempt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_correct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Attempt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Skill&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;label&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;skill_code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;mistake_code_subq&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;label&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mistake_code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Problem&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Attempt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;problem_id&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;Problem&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Skill&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Problem&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;skill_id&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;Skill&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Attempt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;student_id&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;order_by&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Attempt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page_size&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;page_size&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We chose Option B. The &lt;code&gt;LIMIT 1&lt;/code&gt; inside the subquery is a hard guarantee: regardless of how many &lt;code&gt;MistakeEvent&lt;/code&gt; rows exist for an attempt, the result is always one row per attempt. The database enforces the invariant, not application logic.&lt;/p&gt;

&lt;p&gt;The trade-off is a correlated subquery executing once per result row — potentially slower than a join on large datasets. At Phase 1 volumes (&amp;lt; 10k attempts per student), the composite index on &lt;code&gt;(student_id, created_at)&lt;/code&gt; keeps p99 well under 150ms. Cursor-based pagination and a flattened join are on the Phase 3 backlog.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why It Matters for the Research
&lt;/h2&gt;

&lt;p&gt;The MacLellan framework's Teacher-in-the-Loop principle requires teachers to have information they can act on. Phase 1 gave teachers &lt;em&gt;where&lt;/em&gt; students are stuck (KC mastery states). Phase 2 gives them &lt;em&gt;evidence&lt;/em&gt; — the actual attempt record.&lt;/p&gt;

&lt;p&gt;A teacher who sees "Emma: SUB_BORROW at 12% Novice" now has a follow-up panel: "Emma attempted Q.47 (345 − 178), wrote 177, &lt;strong&gt;BORROW_SKIP&lt;/strong&gt; classified." That's a different kind of information — it's not a model output, it's an event record. Teachers can corroborate or question the classifier's judgement. That auditability matters for our RCT design: we're not studying whether the model is right, we're studying whether teachers who can see this data intervene differently.&lt;/p&gt;

&lt;p&gt;Phase 3 will layer LLM-generated insights on top of this history. But the raw record had to come first — insights without evidence are just confident assertions.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Learned
&lt;/h2&gt;

&lt;p&gt;OFFSET pagination has a well-known flaw: page 2 of an ordered result can drift if new rows are inserted between requests. For attempt history in a classroom tool this is acceptable — a teacher paging through a student's history isn't doing real-time analytics. We documented the cursor-based upgrade path in the architecture notes rather than over-engineering for a problem we don't have yet.&lt;/p&gt;

&lt;p&gt;The access control pattern here is simpler than Phase 1's dual-role endpoint. This route is teacher-only, so &lt;code&gt;require_teacher&lt;/code&gt; covers it cleanly — no route-level role check needed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@router.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/students/{student_id}/attempts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AttemptsPageResponse&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_student_attempts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UUID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ge&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;page_size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ge&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;le&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AsyncSession&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Depends&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;get_db&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Depends&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;require_teacher&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;AttemptsPageResponse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No &lt;code&gt;require_authenticated&lt;/code&gt; + manual role check — the student version of this endpoint would be a separate route with its own access control. Keeping them apart is cleaner than a single endpoint that branches on role.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;Phase 3 adds a &lt;code&gt;GenerateInsightUseCase&lt;/code&gt; — an LLM call that reads a student's attempt history and KC states, then produces a natural-language summary for the teacher. The attempt history table we built here is the input to that call.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scalar correlated subquery &amp;gt; LEFT JOIN for nullable one-to-one&lt;/strong&gt; — &lt;code&gt;LIMIT 1&lt;/code&gt; inside the subquery is a hard row-count guarantee that survives future schema changes; a LEFT JOIN can silently multiply rows if the cardinality assumption ever breaks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Raw evidence before generated insight&lt;/strong&gt; — giving teachers the attempt record (what the student actually did) before generating LLM summaries (what the model thinks it means) keeps the teacher as the interpreter, not the model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OFFSET pagination is fine until it isn't&lt;/strong&gt; — document the cursor-based upgrade path in architecture notes and move on; don't prematurely optimise for data volumes that don't exist yet&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>numpath</category>
      <category>adaptivelearning</category>
      <category>python</category>
      <category>vue</category>
    </item>
    <item>
      <title>Why teachers need explainable AI, not just accurate AI — building the KC dashboard</title>
      <dc:creator>Oscar Rieken</dc:creator>
      <pubDate>Thu, 21 May 2026 03:51:51 +0000</pubDate>
      <link>https://dev.to/orieken/why-teachers-need-explainable-ai-not-just-accurate-ai-building-the-kc-dashboard-4iga</link>
      <guid>https://dev.to/orieken/why-teachers-need-explainable-ai-not-just-accurate-ai-building-the-kc-dashboard-4iga</guid>
      <description>&lt;h2&gt;
  
  
  What We Built
&lt;/h2&gt;

&lt;p&gt;NumPath's teacher dashboard previously showed one number per student: 7-day accuracy. A teacher looking at "Emma — 43%" has no idea whether Emma is struggling with borrowing, place value, number sense, or all three. The number is technically correct and completely unactionable.&lt;/p&gt;

&lt;p&gt;In this post I'll walk through how we added a Knowledge Component (KC) mastery panel to the dashboard — colour-coded progress bars per skill that expand to show p_mastery %, mastery level label, and opportunity count. The backend piece is a single endpoint backed by a left-join use case. The research reason it matters is more interesting than the code.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Design Decision
&lt;/h2&gt;

&lt;p&gt;The core choice was: &lt;strong&gt;what data does a teacher actually need?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We had three options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Accuracy-only&lt;/strong&gt; (what we had): fast to compute, no additional queries, but unactionable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Raw BKT parameters&lt;/strong&gt;: show &lt;code&gt;p_mastery&lt;/code&gt;, &lt;code&gt;p_learn&lt;/code&gt;, &lt;code&gt;p_guess&lt;/code&gt;, &lt;code&gt;p_slip&lt;/code&gt; — complete but overwhelming for a classroom teacher&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;KC mastery levels&lt;/strong&gt;: translate &lt;code&gt;p_mastery&lt;/code&gt; into a three-tier label (Novice / Developing / Mastered) with colour coding, keeping the raw number available on expand&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We chose option 3. The mastery level thresholds are defined as named constants in &lt;code&gt;get_kc_states.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;_MASTERY_DEVELOPING&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.40&lt;/span&gt;
&lt;span class="n"&gt;_MASTERY_MASTERED&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.80&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_mastery_level&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p_mastery&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p_mastery&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;_MASTERY_MASTERED&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mastered&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p_mastery&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;_MASTERY_DEVELOPING&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Developing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Novice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One deliberate UX choice: a student with no attempts at all still sees all 5 skills at 0% / Novice. There's no "no data yet" placeholder. The teacher sees the full KC grid from day one — an empty bar is information ("this student hasn't encountered this skill yet"), not an error.&lt;/p&gt;

&lt;p&gt;The access control pattern is worth noting too. We added a &lt;code&gt;require_authenticated&lt;/code&gt; dependency — any valid JWT — and enforced role logic in the route handler:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@router.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/{student_id}/kc-states&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;KCStatesResponse&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_kc_states&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UUID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AsyncSession&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Depends&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;get_db&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Depends&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;require_authenticated&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;KCStatesResponse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;student&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sub&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;HTTPException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HTTP_403_FORBIDDEN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;detail&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Access denied&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Students see their own KC states. Teachers see any student's. The rule lives in one place — the route — rather than being split across two separate dependency functions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why It Matters for the Research
&lt;/h2&gt;

&lt;p&gt;The MacLellan ITS framework's "Teacher-in-the-Loop" principle isn't just about giving teachers a screen. It's about giving them &lt;strong&gt;information they can act on&lt;/strong&gt;. A 43% accuracy number tells a teacher "this student is struggling." A KC panel that shows SUB_BORROW at 12% (Novice, 8 attempts) while PLACE_VALUE is at 67% (Developing, 14 attempts) tells a teacher "this student needs targeted borrowing practice — and they've already tried eight times, so hints aren't landing."&lt;/p&gt;

&lt;p&gt;That's the difference between a reporting tool and a teaching tool. The RCT we're designing in Phase 4 will measure whether teachers who have KC-level visibility actually intervene differently than those who see accuracy alone. This dashboard is the instrument we're studying, not just a convenience feature.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Learned
&lt;/h2&gt;

&lt;p&gt;The left-join strategy — two separate queries plus a dict lookup — turned out to be cleaner than an ORM &lt;code&gt;outerjoin()&lt;/code&gt;. SQLAlchemy async &lt;code&gt;outerjoin()&lt;/code&gt; with nullable columns requires explicit handling of None values in ways that are easy to get wrong. Two queries and a &lt;code&gt;dict.get()&lt;/code&gt; with a default is more readable and easier to test with mocks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;kc_by_skill_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;skill_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;kc_records&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;summaries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="nc"&gt;KCStateSummary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;skill_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;skill&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;p_mastery&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kc_by_skill_id&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;skill&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;p_mastery&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;skill&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;kc_by_skill_id&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;skill&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;all_skills&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nine unit tests covering the use case ran in 0.03s with no live database. That's the payoff for keeping the domain logic in a use case rather than inline in the route.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;Phase 2 of the KC dashboard adds recent attempt history to the student detail panel — the specific problems a student got wrong, with their classified mistake codes, so a teacher can see patterns as they form.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Accuracy is output, KC mastery is signal&lt;/strong&gt; — a single accuracy number is not enough for a teacher to act; per-KC mastery state is the minimum viable explainability for an ITS&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Empty is informative, not broken&lt;/strong&gt; — showing all KCs at 0% for a new student tells a teacher "this skill hasn't been practised yet"; hiding it implies the data is missing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Two queries + dict &amp;gt; one complex join&lt;/strong&gt; — for small, static reference data (5 skills), two simple queries and a dict lookup are more readable, testable, and maintainable than an ORM outer join with nullable column handling&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>numpath</category>
      <category>adaptivelearning</category>
      <category>vue</category>
      <category>python</category>
    </item>
    <item>
      <title>Closing the feedback loop: how mistake classification drives adaptive problem selection in NumPath</title>
      <dc:creator>Oscar Rieken</dc:creator>
      <pubDate>Thu, 21 May 2026 03:50:28 +0000</pubDate>
      <link>https://dev.to/orieken/closing-the-feedback-loop-how-mistake-classification-drives-adaptive-problem-selection-in-numpath-5ce9</link>
      <guid>https://dev.to/orieken/closing-the-feedback-loop-how-mistake-classification-drives-adaptive-problem-selection-in-numpath-5ce9</guid>
      <description>&lt;h2&gt;
  
  
  What We Built
&lt;/h2&gt;

&lt;p&gt;NumPath is an AI math tutor for children with dyscalculia. At its core is an adaptive engine that picks the next problem for each student based on their Bayesian Knowledge Tracing (BKT) mastery estimate. In this post I'll walk through a problem we had — and solved — in the rule-based phase: classified mistakes were being logged but completely ignored by the selection engine.&lt;/p&gt;

&lt;p&gt;The fix was a 60-line change across two files. The research implication is significant.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: a diagnostic signal going nowhere
&lt;/h2&gt;

&lt;p&gt;Our &lt;code&gt;MistakeClassifier&lt;/code&gt; already tagged every wrong answer with a structured code — &lt;code&gt;BORROW_SKIP&lt;/code&gt; when a student adds instead of subtracts with borrowing, &lt;code&gt;DIGIT_REVERSAL&lt;/code&gt; when they write 51 for 15, &lt;code&gt;MAGNITUDE_MISJUDGE&lt;/code&gt; when they pick the smaller number as larger. These &lt;code&gt;MistakeEvent&lt;/code&gt; records were hitting the database on every incorrect attempt.&lt;/p&gt;

&lt;p&gt;But &lt;code&gt;GetNextProblemUseCase&lt;/code&gt; — the code that decides what problem a student gets next — never read them. The engine was selecting problems purely on BKT &lt;code&gt;p_mastery&lt;/code&gt;. A student could hit &lt;code&gt;BORROW_SKIP&lt;/code&gt; three sessions in a row and still receive problems at the same difficulty, on the same skill, with zero response to the pattern.&lt;/p&gt;

&lt;p&gt;This violates what MacLellan et al. call the "Error as Diagnostic Signal" principle: &lt;strong&gt;mistakes should trigger targeted remediation, not generic retry&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Design Decision
&lt;/h2&gt;

&lt;p&gt;The core question was: &lt;em&gt;when&lt;/em&gt; should a mistake pattern trigger a response, and &lt;em&gt;what&lt;/em&gt; should that response be?&lt;/p&gt;

&lt;p&gt;We settled on three rules, each encoded as a named constant:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;MISTAKE_WINDOW&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;        &lt;span class="c1"&gt;# look back this many MistakeEvents
# threshold = ceil(MISTAKE_WINDOW / 2) = 2 — dominant code must appear ≥ 2× in window
&lt;/span&gt;
&lt;span class="n"&gt;MISTAKE_KC_MAP&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DIGIT_REVERSAL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PLACE_VALUE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BORROW_SKIP&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;           &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SUB_BORROW&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MAGNITUDE_MISJUDGE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PLACE_VALUE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PLACE_VALUE_CONFUSION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PLACE_VALUE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPERATION_CONFUSION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPERATION_SIGN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When &lt;code&gt;_detect_mistake_signal()&lt;/code&gt; fires, two things happen:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Skill override&lt;/strong&gt; — the engine targets the KC linked to that mistake code, even if another KC has lower &lt;code&gt;p_mastery&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Difficulty drop&lt;/strong&gt; — one &lt;code&gt;DIFFICULTY_STEP&lt;/code&gt; (0.2) down, floored at &lt;code&gt;ENTRY_DIFFICULTY&lt;/code&gt; (0.3) to prevent over-scaffolding students who are already at entry level.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;What we explicitly rejected: resetting difficulty to zero (too harsh for students who've been making progress), and weighting by mistake severity (too complex for Phase 1 with no real data to calibrate against).&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;reason&lt;/code&gt; field on every &lt;code&gt;NextProblemResponse&lt;/code&gt; now names the triggering pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Remediation: BORROW_SKIP detected 2× on SUB_BORROW (p_mastery=0.41)"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the explainability requirement. A teacher looking at this in the dashboard can understand exactly why the system chose what it did.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why It Matters for the Research
&lt;/h2&gt;

&lt;p&gt;The central claim of NumPath's RCT will be that adaptive, mistake-aware tutoring produces better outcomes than static worksheets for dyscalculic learners. Before this change, we had a system that adapted difficulty based on streaks but ignored the &lt;em&gt;type&lt;/em&gt; of error a student was making. That's not meaningfully different from a worksheet that repeats problems when you get them wrong.&lt;/p&gt;

&lt;p&gt;Closing this loop — mistake code → KC target → difficulty adjustment → &lt;code&gt;reason&lt;/code&gt; field — is what makes the system an Intelligent Tutoring System rather than a difficulty slider. Every &lt;code&gt;MistakeEvent&lt;/code&gt; record is now a longitudinal data point that shapes the student's next experience, and that chain of causality is fully traceable.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Learned
&lt;/h2&gt;

&lt;p&gt;The implementation was straightforward. The harder question was the threshold: why 2 of 3, not 3 of 3? Three-of-three is too strict — a student who makes &lt;code&gt;BORROW_SKIP&lt;/code&gt;, then &lt;code&gt;DIGIT_REVERSAL&lt;/code&gt;, then &lt;code&gt;BORROW_SKIP&lt;/code&gt; again has a clear pattern but the strict threshold misses it. Two-of-three catches the pattern earlier at the cost of occasional false positives. We don't yet have real student data to validate this choice — it's a hypothesis. We've logged it as a research note for Phase 4.&lt;/p&gt;

&lt;p&gt;The one thing I'd do differently: add the &lt;code&gt;MistakeEvent&lt;/code&gt; index to the model on day one. It was missing and only caught during the performance review pass. A composite index on &lt;code&gt;(student_id, created_at)&lt;/code&gt; is obvious in hindsight for any table you're going to query with &lt;code&gt;ORDER BY created_at DESC LIMIT N&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;Next up: wiring the KC states into the teacher dashboard so educators can see p_mastery per student, not just 7-day accuracy — the final piece of the MacLellan "Teacher-in-the-Loop" principle.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Logging errors is not the same as learning from them&lt;/strong&gt; — a diagnostic signal only matters if it changes what happens next; wiring &lt;code&gt;MistakeEvent&lt;/code&gt; into &lt;code&gt;select_next_problem()&lt;/code&gt; is a 60-line change with a meaningful research impact&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The &lt;code&gt;reason&lt;/code&gt; field is not a nice-to-have&lt;/strong&gt; — every adaptive decision must be explainable to a teacher; string-formatted rationale on each &lt;code&gt;NextProblemResponse&lt;/code&gt; is the minimum viable explainability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Named constants beat magic numbers&lt;/strong&gt; — &lt;code&gt;MISTAKE_WINDOW&lt;/code&gt;, &lt;code&gt;FRUSTRATION_WINDOW&lt;/code&gt;, &lt;code&gt;MASTERY_WINDOW&lt;/code&gt; sit side by side; when we have real data to calibrate thresholds, we change one line each&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>numpath</category>
      <category>adaptivelearning</category>
      <category>python</category>
      <category>bayesian</category>
    </item>
    <item>
      <title>Why I'm building an AI math tutor for dyscalculia — and grounding it in 30 years of ITS research</title>
      <dc:creator>Oscar Rieken</dc:creator>
      <pubDate>Thu, 21 May 2026 03:28:14 +0000</pubDate>
      <link>https://dev.to/orieken/why-im-building-an-ai-math-tutor-for-dyscalculia-and-grounding-it-in-30-years-of-its-research-cgb</link>
      <guid>https://dev.to/orieken/why-im-building-an-ai-math-tutor-for-dyscalculia-and-grounding-it-in-30-years-of-its-research-cgb</guid>
      <description>&lt;h2&gt;
  
  
  The problem I kept seeing
&lt;/h2&gt;

&lt;p&gt;When I started this research I expected dyscalculia to be a niche condition — something a handful of children had, easily addressed with extra practice. The numbers don't support that. Roughly 5–7% of school-age children have dyscalculia: a specific learning difficulty with number sense that is neurological in origin, persistent across development, and largely invisible in standard classroom assessments.&lt;/p&gt;

&lt;p&gt;It's not "bad at maths." A child with dyscalculia might have strong reading comprehension, solid spatial reasoning, and consistently fail to grasp that 9 comes before 10. The difficulty is specific, categorical, and resistant to the kind of general maths instruction classrooms provide. Standard adaptive apps — the ones with stars and progress bars — don't help much because they adapt difficulty without adapting &lt;em&gt;to the mechanism of the difficulty&lt;/em&gt;. A child who confuses 51 for 15 (digit reversal) needs something different from a child who skips borrowing. Treating both as "got it wrong, try again" misses the point.&lt;/p&gt;

&lt;p&gt;This is the gap NumPath is designed to study. Not to solve — to study, rigorously, in a randomised controlled trial.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why ITS research, not a product
&lt;/h2&gt;

&lt;p&gt;Intelligent Tutoring Systems have a 30-year research literature, and the core insights from it are not widely implemented in consumer software. That's partly because the research is paywalled, partly because it's written for academics rather than engineers, and partly because implementing it properly requires more architecture than a typical edtech MVP.&lt;/p&gt;

&lt;p&gt;NumPath is built on four distinct research contributions. I want to be precise about attribution here because I got it wrong early on:&lt;/p&gt;




&lt;h3&gt;
  
  
  1. The Knowledge Component (KC) model
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Anderson &amp;amp; Corbett (1992); Koedinger &amp;amp; Aleven (2007)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A KC is the smallest unit of knowledge required to complete one step in a problem. Not "subtraction" — that's too coarse. Not "subtract 178 from 345 by borrowing from the tens column, carrying 1, and writing the result in the ones column" — that's too specific. A KC is something like &lt;code&gt;SUB_BORROW&lt;/code&gt;: the procedure of regrouping when subtracting multi-digit numbers.&lt;/p&gt;

&lt;p&gt;In NumPath every skill in the database is a KC:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;SKILLS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SUB_BORROW&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Subtraction with borrowing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PLACE_VALUE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Place value understanding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NUMBER_LINE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Number line navigation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NUMBER_SENSE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Basic number magnitude and ordering&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPERATION_SIGN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Reading and applying operation signs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each student has an independent mastery estimate for each KC. Progress on &lt;code&gt;SUB_BORROW&lt;/code&gt; doesn't transfer to &lt;code&gt;PLACE_VALUE&lt;/code&gt;. A student doesn't "advance past subtraction" — they master individual KCs, one at a time.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Bayesian Knowledge Tracing (BKT)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Corbett &amp;amp; Anderson (1995)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;BKT is the probabilistic model that estimates whether a student has mastered a KC, updated on every attempt. Four parameters per student per KC:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;p_mastery&lt;/code&gt; — P(student has learned this KC)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;p_learn&lt;/code&gt; — P(learning occurs on this attempt, given not yet learned)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;p_guess&lt;/code&gt; — P(correct answer given KC not learned)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;p_slip&lt;/code&gt; — P(incorrect answer given KC is learned)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Update equations after observing an answer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# After a correct answer:
&lt;/span&gt;&lt;span class="n"&gt;posterior&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;slip&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;slip&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;guess&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# After an incorrect answer:
&lt;/span&gt;&lt;span class="n"&gt;posterior&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;slip&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;slip&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;guess&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="c1"&gt;# Learning update (applied after either):
&lt;/span&gt;&lt;span class="n"&gt;p_new&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;posterior&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;posterior&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;p_learn&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not MacLellan's work — it predates his research by ~20 years. I got the attribution wrong in early drafts and I've since corrected it throughout the codebase. BKT is Corbett &amp;amp; Anderson.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. The Apprentice Learner (AL) Architecture
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;MacLellan, Harpstead, Patel &amp;amp; Koedinger — EDM 2016 (Exemplary Paper Award)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;BKT predicts &lt;em&gt;whether&lt;/em&gt; a student knows something. The AL Architecture models &lt;em&gt;how&lt;/em&gt; they acquire it. MacLellan distinguishes three learning mechanisms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;How-learning&lt;/strong&gt; — generalising the procedure from examples&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Where-learning&lt;/strong&gt; — learning which contexts a skill applies to&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When-learning&lt;/strong&gt; — learning the conditions that trigger the skill&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the conceptual basis for NumPath's mistake classifier. &lt;code&gt;BORROW_SKIP&lt;/code&gt; (student applies subtraction but skips the regrouping step) is a how-learning failure — the student hasn't generalised the full procedure. &lt;code&gt;OPERATION_CONFUSION&lt;/code&gt; (student adds when the problem requires subtraction) is a when-learning failure — the student fires the wrong skill given the sign.&lt;/p&gt;

&lt;p&gt;Classifying mistakes this way means the tutor can respond differently: how-learning failures need procedural scaffolding; when-learning failures need context discrimination practice.&lt;/p&gt;




&lt;h3&gt;
  
  
  4. The Natural Training Interaction (NTI) Framework
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;MacLellan, Harpstead &amp;amp; Koedinger — AAAI 2018 Spring Symposium&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is MacLellan's most directly applicable contribution to NumPath's teacher dashboard. The NTI Framework treats teachers as &lt;em&gt;first-class participants&lt;/em&gt; in the adaptive loop, not passive viewers of a report. Concretely, it means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every AI-generated insight must cite specific evidence — a KC code, a p_mastery value, a mistake count. No generic "this student needs more practice."&lt;/li&gt;
&lt;li&gt;Teachers must be able to confirm or override AI judgments (their feedback improves the system over time)&lt;/li&gt;
&lt;li&gt;The system must support natural conversation between AI output and teacher judgment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In NumPath this shows up in the insight response shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Aiden skips borrowing in 9 of 11 recent subtraction attempts."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"warn"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"evidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"kc"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SUB_BORROW"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"p_mastery"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mistake_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"BORROW_SKIP"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mistake_count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"window"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"last 11 attempts"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;evidence&lt;/code&gt; block isn't generated by the LLM — it's assembled from DB reads before the prompt is sent. Explainability is structural, not hoped for.&lt;/p&gt;




&lt;h2&gt;
  
  
  What NumPath actually is
&lt;/h2&gt;

&lt;p&gt;A FastAPI + Vue 3 web application, deployed with Docker Compose. Students practice math problems; the BKT model updates on every attempt; the adaptive engine selects the next problem based on KC mastery states and recent mistake patterns. Teachers see a dashboard showing per-student KC progress, attempt history, and LLM-generated insights.&lt;/p&gt;

&lt;p&gt;The four-phase roadmap:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Phase&lt;/th&gt;
&lt;th&gt;Focus&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;MVP: BKT + adaptive engine + teacher dashboard&lt;/td&gt;
&lt;td&gt;In progress&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Mistake classifier v2 + DKT model&lt;/td&gt;
&lt;td&gt;Planned&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;RCT instrumentation&lt;/td&gt;
&lt;td&gt;Planned&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Randomised controlled trial&lt;/td&gt;
&lt;td&gt;Planned&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The RCT will compare learning outcomes for students using NumPath against a control group using standard worksheet practice. The teacher dashboard is an instrument in that experiment, not just a feature.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this series covers
&lt;/h2&gt;

&lt;p&gt;Four posts are already written, each covering a specific engineering decision:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="//./01-closing-the-feedback-loop.md"&gt;Closing the feedback loop&lt;/a&gt;&lt;/strong&gt; — how we wired mistake classification into adaptive problem selection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="//./02-teacher-kc-dashboard.md"&gt;Why teachers need explainable AI&lt;/a&gt;&lt;/strong&gt; — building the KC mastery dashboard (Phase 1)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="//./03-attempt-history-scalar-subquery.md"&gt;Attempt history and the scalar subquery pattern&lt;/a&gt;&lt;/strong&gt; — paginated attempt history and a safe LEFT JOIN alternative&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="//./04-prompt-engineering-teacher-insights.md"&gt;Prompt engineering for teacher insights&lt;/a&gt;&lt;/strong&gt; — structured JSON output and graceful fallbacks with Claude&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All posts are first-person engineering notes. The code is real, the trade-offs are real, and I'll be honest when something didn't work as expected.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;Phase 1 is nearly feature-complete. The next milestone is running the full test suite on live data with a small pilot group before the RCT design begins.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dyscalculia is specific, not general&lt;/strong&gt; — "more practice" is the wrong intervention; KC-level mastery tracking is the minimum viable response&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The ITS research literature has four distinct pillars, not one&lt;/strong&gt; — KC model (1992), BKT (1995), AL Architecture (2016), NTI Framework (2018) — each does a different job and attribution matters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explainability must be structural&lt;/strong&gt; — an insight that cites a specific KC and mistake count is auditable; one that says "this student is struggling" is not&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>numpath</category>
      <category>adaptivelearning</category>
      <category>dyscalculia</category>
      <category>python</category>
    </item>
  </channel>
</rss>
