Attempt history in the teacher dashboard — the scalar subquery pattern

#numpath #adaptivelearning #python #vue

What We Built

In Phase 1 of the KC dashboard we gave teachers per-skill mastery states for each student — colour-coded bars showing Novice / Developing / Mastered. That answered "where is this student stuck?" Phase 2 answers "what exactly did they do wrong?"

We added a paginated attempt history table below the KC panel. For the selected student, teachers see every attempt in reverse-chronological order: the problem question, the student's answer, whether it was correct, the timestamp, which Knowledge Component was being practised, and — when the classifier fired — the mistake code (e.g. BORROW_SKIP, DIGIT_REVERSAL). The backend is a single endpoint, GET /teacher/students/{id}/attempts, teacher-only, with OFFSET pagination.

The Design Decision

The tricky part was joining MistakeEvent.mistake_code onto Attempt rows. Not every attempt produces a mistake event — the classifier only fires when an incorrect answer matches a known pattern. So the join is optional (LEFT JOIN territory), and the relationship is one-to-one in intent but not enforced that way in the schema.

We considered two approaches:

Option A — LEFT JOIN with GROUP BY:

SELECT a.*, me.mistake_code
FROM attempts a
LEFT JOIN mistake_events me ON me.attempt_id = a.id
ORDER BY a.created_at DESC
LIMIT 10 OFFSET 0

This works now. But if a future bug or edge case produces two MistakeEvent rows for one attempt, every teacher session silently doubles those rows. The bug would be invisible — no error, just wrong data.

Option B — Scalar correlated subquery:

mistake_code_subq = (
    select(MistakeEvent.mistake_code)
    .where(MistakeEvent.attempt_id == Attempt.id)
    .limit(1)
    .scalar_subquery()
)

stmt = (
    select(
        Attempt.id,
        Problem.content,
        Attempt.answer_given,
        Attempt.is_correct,
        Attempt.created_at,
        Skill.code.label("skill_code"),
        mistake_code_subq.label("mistake_code"),
    )
    .join(Problem, Attempt.problem_id == Problem.id)
    .join(Skill, Problem.skill_id == Skill.id)
    .where(Attempt.student_id == student_id)
    .order_by(Attempt.created_at.desc())
    .limit(page_size)
    .offset((page - 1) * page_size)
)

We chose Option B. The LIMIT 1 inside the subquery is a hard guarantee: regardless of how many MistakeEvent rows exist for an attempt, the result is always one row per attempt. The database enforces the invariant, not application logic.

The trade-off is a correlated subquery executing once per result row — potentially slower than a join on large datasets. At Phase 1 volumes (< 10k attempts per student), the composite index on (student_id, created_at) keeps p99 well under 150ms. Cursor-based pagination and a flattened join are on the Phase 3 backlog.

Why It Matters for the Research

The MacLellan framework's Teacher-in-the-Loop principle requires teachers to have information they can act on. Phase 1 gave teachers where students are stuck (KC mastery states). Phase 2 gives them evidence — the actual attempt record.

A teacher who sees "Emma: SUB_BORROW at 12% Novice" now has a follow-up panel: "Emma attempted Q.47 (345 − 178), wrote 177, BORROW_SKIP classified." That's a different kind of information — it's not a model output, it's an event record. Teachers can corroborate or question the classifier's judgement. That auditability matters for our RCT design: we're not studying whether the model is right, we're studying whether teachers who can see this data intervene differently.

Phase 3 will layer LLM-generated insights on top of this history. But the raw record had to come first — insights without evidence are just confident assertions.

What We Learned

OFFSET pagination has a well-known flaw: page 2 of an ordered result can drift if new rows are inserted between requests. For attempt history in a classroom tool this is acceptable — a teacher paging through a student's history isn't doing real-time analytics. We documented the cursor-based upgrade path in the architecture notes rather than over-engineering for a problem we don't have yet.

The access control pattern here is simpler than Phase 1's dual-role endpoint. This route is teacher-only, so require_teacher covers it cleanly — no route-level role check needed:

@router.get("/students/{student_id}/attempts", response_model=AttemptsPageResponse)
async def get_student_attempts(
    student_id: uuid.UUID,
    page: int = Query(default=1, ge=1),
    page_size: int = Query(default=10, ge=1, le=100),
    db: AsyncSession = Depends(get_db),
    _: dict = Depends(require_teacher),
) -> AttemptsPageResponse:
    ...

No require_authenticated + manual role check — the student version of this endpoint would be a separate route with its own access control. Keeping them apart is cleaner than a single endpoint that branches on role.

What's Next

Phase 3 adds a GenerateInsightUseCase — an LLM call that reads a student's attempt history and KC states, then produces a natural-language summary for the teacher. The attempt history table we built here is the input to that call.

Key Takeaways

Scalar correlated subquery > LEFT JOIN for nullable one-to-one — LIMIT 1 inside the subquery is a hard row-count guarantee that survives future schema changes; a LEFT JOIN can silently multiply rows if the cardinality assumption ever breaks
Raw evidence before generated insight — giving teachers the attempt record (what the student actually did) before generating LLM summaries (what the model thinks it means) keeps the teacher as the interpreter, not the model
OFFSET pagination is fine until it isn't — document the cursor-based upgrade path in architecture notes and move on; don't prematurely optimise for data volumes that don't exist yet