DEV Community

Cover image for Attempt history in the teacher dashboard — the scalar subquery pattern
Oscar Rieken
Oscar Rieken

Posted on

Attempt history in the teacher dashboard — the scalar subquery pattern

What We Built

In Phase 1 of the KC dashboard we gave teachers per-skill mastery states for each student — colour-coded bars showing Novice / Developing / Mastered. That answered "where is this student stuck?" Phase 2 answers "what exactly did they do wrong?"

We added a paginated attempt history table below the KC panel. For the selected student, teachers see every attempt in reverse-chronological order: the problem question, the student's answer, whether it was correct, the timestamp, which Knowledge Component was being practised, and — when the classifier fired — the mistake code (e.g. BORROW_SKIP, DIGIT_REVERSAL). The backend is a single endpoint, GET /teacher/students/{id}/attempts, teacher-only, with OFFSET pagination.

The Design Decision

The tricky part was joining MistakeEvent.mistake_code onto Attempt rows. Not every attempt produces a mistake event — the classifier only fires when an incorrect answer matches a known pattern. So the join is optional (LEFT JOIN territory), and the relationship is one-to-one in intent but not enforced that way in the schema.

We considered two approaches:

Option A — LEFT JOIN with GROUP BY:

SELECT a.*, me.mistake_code
FROM attempts a
LEFT JOIN mistake_events me ON me.attempt_id = a.id
ORDER BY a.created_at DESC
LIMIT 10 OFFSET 0
Enter fullscreen mode Exit fullscreen mode

This works now. But if a future bug or edge case produces two MistakeEvent rows for one attempt, every teacher session silently doubles those rows. The bug would be invisible — no error, just wrong data.

Option B — Scalar correlated subquery:

mistake_code_subq = (
    select(MistakeEvent.mistake_code)
    .where(MistakeEvent.attempt_id == Attempt.id)
    .limit(1)
    .scalar_subquery()
)

stmt = (
    select(
        Attempt.id,
        Problem.content,
        Attempt.answer_given,
        Attempt.is_correct,
        Attempt.created_at,
        Skill.code.label("skill_code"),
        mistake_code_subq.label("mistake_code"),
    )
    .join(Problem, Attempt.problem_id == Problem.id)
    .join(Skill, Problem.skill_id == Skill.id)
    .where(Attempt.student_id == student_id)
    .order_by(Attempt.created_at.desc())
    .limit(page_size)
    .offset((page - 1) * page_size)
)
Enter fullscreen mode Exit fullscreen mode

We chose Option B. The LIMIT 1 inside the subquery is a hard guarantee: regardless of how many MistakeEvent rows exist for an attempt, the result is always one row per attempt. The database enforces the invariant, not application logic.

The trade-off is a correlated subquery executing once per result row — potentially slower than a join on large datasets. At Phase 1 volumes (< 10k attempts per student), the composite index on (student_id, created_at) keeps p99 well under 150ms. Cursor-based pagination and a flattened join are on the Phase 3 backlog.

Why It Matters for the Research

The MacLellan framework's Teacher-in-the-Loop principle requires teachers to have information they can act on. Phase 1 gave teachers where students are stuck (KC mastery states). Phase 2 gives them evidence — the actual attempt record.

A teacher who sees "Emma: SUB_BORROW at 12% Novice" now has a follow-up panel: "Emma attempted Q.47 (345 − 178), wrote 177, BORROW_SKIP classified." That's a different kind of information — it's not a model output, it's an event record. Teachers can corroborate or question the classifier's judgement. That auditability matters for our RCT design: we're not studying whether the model is right, we're studying whether teachers who can see this data intervene differently.

Phase 3 will layer LLM-generated insights on top of this history. But the raw record had to come first — insights without evidence are just confident assertions.

What We Learned

OFFSET pagination has a well-known flaw: page 2 of an ordered result can drift if new rows are inserted between requests. For attempt history in a classroom tool this is acceptable — a teacher paging through a student's history isn't doing real-time analytics. We documented the cursor-based upgrade path in the architecture notes rather than over-engineering for a problem we don't have yet.

The access control pattern here is simpler than Phase 1's dual-role endpoint. This route is teacher-only, so require_teacher covers it cleanly — no route-level role check needed:

@router.get("/students/{student_id}/attempts", response_model=AttemptsPageResponse)
async def get_student_attempts(
    student_id: uuid.UUID,
    page: int = Query(default=1, ge=1),
    page_size: int = Query(default=10, ge=1, le=100),
    db: AsyncSession = Depends(get_db),
    _: dict = Depends(require_teacher),
) -> AttemptsPageResponse:
    ...
Enter fullscreen mode Exit fullscreen mode

No require_authenticated + manual role check — the student version of this endpoint would be a separate route with its own access control. Keeping them apart is cleaner than a single endpoint that branches on role.

What's Next

Phase 3 adds a GenerateInsightUseCase — an LLM call that reads a student's attempt history and KC states, then produces a natural-language summary for the teacher. The attempt history table we built here is the input to that call.

Key Takeaways

  • Scalar correlated subquery > LEFT JOIN for nullable one-to-oneLIMIT 1 inside the subquery is a hard row-count guarantee that survives future schema changes; a LEFT JOIN can silently multiply rows if the cardinality assumption ever breaks
  • Raw evidence before generated insight — giving teachers the attempt record (what the student actually did) before generating LLM summaries (what the model thinks it means) keeps the teacher as the interpreter, not the model
  • OFFSET pagination is fine until it isn't — document the cursor-based upgrade path in architecture notes and move on; don't prematurely optimise for data volumes that don't exist yet

Top comments (0)