Oscar Rieken

Posted on May 21

Why I'm building an AI math tutor for dyscalculia — and grounding it in 30 years of ITS research

#numpath #adaptivelearning #dyscalculia #python

The problem I kept seeing

When I started this research I expected dyscalculia to be a niche condition — something a handful of children had, easily addressed with extra practice. The numbers don't support that. Roughly 5–7% of school-age children have dyscalculia: a specific learning difficulty with number sense that is neurological in origin, persistent across development, and largely invisible in standard classroom assessments.

It's not "bad at maths." A child with dyscalculia might have strong reading comprehension, solid spatial reasoning, and consistently fail to grasp that 9 comes before 10. The difficulty is specific, categorical, and resistant to the kind of general maths instruction classrooms provide. Standard adaptive apps — the ones with stars and progress bars — don't help much because they adapt difficulty without adapting to the mechanism of the difficulty. A child who confuses 51 for 15 (digit reversal) needs something different from a child who skips borrowing. Treating both as "got it wrong, try again" misses the point.

This is the gap NumPath is designed to study. Not to solve — to study, rigorously, in a randomised controlled trial.

Why ITS research, not a product

Intelligent Tutoring Systems have a 30-year research literature, and the core insights from it are not widely implemented in consumer software. That's partly because the research is paywalled, partly because it's written for academics rather than engineers, and partly because implementing it properly requires more architecture than a typical edtech MVP.

NumPath is built on four distinct research contributions. I want to be precise about attribution here because I got it wrong early on:

1. The Knowledge Component (KC) model

Anderson & Corbett (1992); Koedinger & Aleven (2007)

A KC is the smallest unit of knowledge required to complete one step in a problem. Not "subtraction" — that's too coarse. Not "subtract 178 from 345 by borrowing from the tens column, carrying 1, and writing the result in the ones column" — that's too specific. A KC is something like SUB_BORROW: the procedure of regrouping when subtracting multi-digit numbers.

In NumPath every skill in the database is a KC:

SKILLS = [
    {"code": "SUB_BORROW",    "name": "Subtraction with borrowing"},
    {"code": "PLACE_VALUE",   "name": "Place value understanding"},
    {"code": "NUMBER_LINE",   "name": "Number line navigation"},
    {"code": "NUMBER_SENSE",  "name": "Basic number magnitude and ordering"},
    {"code": "OPERATION_SIGN","name": "Reading and applying operation signs"},
]

Each student has an independent mastery estimate for each KC. Progress on SUB_BORROW doesn't transfer to PLACE_VALUE. A student doesn't "advance past subtraction" — they master individual KCs, one at a time.

2. Bayesian Knowledge Tracing (BKT)

Corbett & Anderson (1995)

BKT is the probabilistic model that estimates whether a student has mastered a KC, updated on every attempt. Four parameters per student per KC:

p_mastery — P(student has learned this KC)
p_learn — P(learning occurs on this attempt, given not yet learned)
p_guess — P(correct answer given KC not learned)
p_slip — P(incorrect answer given KC is learned)

Update equations after observing an answer:

# After a correct answer:
posterior = p * (1 - slip) / (p * (1 - slip) + (1 - p) * guess)
# After an incorrect answer:
posterior = p * slip / (p * slip + (1 - p) * (1 - guess))
# Learning update (applied after either):
p_new = posterior + (1 - posterior) * p_learn

This is not MacLellan's work — it predates his research by ~20 years. I got the attribution wrong in early drafts and I've since corrected it throughout the codebase. BKT is Corbett & Anderson.

3. The Apprentice Learner (AL) Architecture

MacLellan, Harpstead, Patel & Koedinger — EDM 2016 (Exemplary Paper Award)

BKT predicts whether a student knows something. The AL Architecture models how they acquire it. MacLellan distinguishes three learning mechanisms:

How-learning — generalising the procedure from examples
Where-learning — learning which contexts a skill applies to
When-learning — learning the conditions that trigger the skill

This is the conceptual basis for NumPath's mistake classifier. BORROW_SKIP (student applies subtraction but skips the regrouping step) is a how-learning failure — the student hasn't generalised the full procedure. OPERATION_CONFUSION (student adds when the problem requires subtraction) is a when-learning failure — the student fires the wrong skill given the sign.

Classifying mistakes this way means the tutor can respond differently: how-learning failures need procedural scaffolding; when-learning failures need context discrimination practice.

4. The Natural Training Interaction (NTI) Framework

MacLellan, Harpstead & Koedinger — AAAI 2018 Spring Symposium

This is MacLellan's most directly applicable contribution to NumPath's teacher dashboard. The NTI Framework treats teachers as first-class participants in the adaptive loop, not passive viewers of a report. Concretely, it means:

Every AI-generated insight must cite specific evidence — a KC code, a p_mastery value, a mistake count. No generic "this student needs more practice."
Teachers must be able to confirm or override AI judgments (their feedback improves the system over time)
The system must support natural conversation between AI output and teacher judgment

In NumPath this shows up in the insight response shape:

{
  "text": "Aiden skips borrowing in 9 of 11 recent subtraction attempts.",
  "type": "warn",
  "evidence": {
    "kc": "SUB_BORROW",
    "p_mastery": 0.18,
    "mistake_type": "BORROW_SKIP",
    "mistake_count": 9,
    "window": "last 11 attempts"
  }
}

The evidence block isn't generated by the LLM — it's assembled from DB reads before the prompt is sent. Explainability is structural, not hoped for.

What NumPath actually is

A FastAPI + Vue 3 web application, deployed with Docker Compose. Students practice math problems; the BKT model updates on every attempt; the adaptive engine selects the next problem based on KC mastery states and recent mistake patterns. Teachers see a dashboard showing per-student KC progress, attempt history, and LLM-generated insights.

The four-phase roadmap:

Phase	Focus	Status
1	MVP: BKT + adaptive engine + teacher dashboard	In progress
2	Mistake classifier v2 + DKT model	Planned
3	RCT instrumentation	Planned
4	Randomised controlled trial	Planned

The RCT will compare learning outcomes for students using NumPath against a control group using standard worksheet practice. The teacher dashboard is an instrument in that experiment, not just a feature.

What this series covers

Four posts are already written, each covering a specific engineering decision:

Closing the feedback loop — how we wired mistake classification into adaptive problem selection
Why teachers need explainable AI — building the KC mastery dashboard (Phase 1)
Attempt history and the scalar subquery pattern — paginated attempt history and a safe LEFT JOIN alternative
Prompt engineering for teacher insights — structured JSON output and graceful fallbacks with Claude

All posts are first-person engineering notes. The code is real, the trade-offs are real, and I'll be honest when something didn't work as expected.

What's Next

Phase 1 is nearly feature-complete. The next milestone is running the full test suite on live data with a small pilot group before the RCT design begins.

Key Takeaways

Dyscalculia is specific, not general — "more practice" is the wrong intervention; KC-level mastery tracking is the minimum viable response
The ITS research literature has four distinct pillars, not one — KC model (1992), BKT (1995), AL Architecture (2016), NTI Framework (2018) — each does a different job and attribution matters
Explainability must be structural — an insight that cites a specific KC and mistake count is auditable; one that says "this student is struggling" is not

DEV Community