What We Built
In Phase 1 of the KC dashboard we gave teachers per-skill mastery states for each student — colour-coded bars showing Novice / Developing / Mastered. That answered "where is this student stuck?" Phase 2 answers "what exactly did they do wrong?"
We added a paginated attempt history table below the KC panel. For the selected student, teachers see every attempt in reverse-chronological order: the problem question, the student's answer, whether it was correct, the timestamp, which Knowledge Component was being practised, and — when the classifier fired — the mistake code (e.g. BORROW_SKIP, DIGIT_REVERSAL). The backend is a single endpoint, GET /teacher/students/{id}/attempts, teacher-only, with OFFSET pagination.
The Design Decision
The tricky part was joining MistakeEvent.mistake_code onto Attempt rows. Not every attempt produces a mistake event — the classifier only fires when an incorrect answer matches a known pattern. So the join is optional (LEFT JOIN territory), and the relationship is one-to-one in intent but not enforced that way in the schema.
We considered two approaches:
Option A — LEFT JOIN with GROUP BY:
SELECT a.*, me.mistake_code
FROM attempts a
LEFT JOIN mistake_events me ON me.attempt_id = a.id
ORDER BY a.created_at DESC
LIMIT 10 OFFSET 0
This works now. But if a future bug or edge case produces two MistakeEvent rows for one attempt, every teacher session silently doubles those rows. The bug would be invisible — no error, just wrong data.
Option B — Scalar correlated subquery:
mistake_code_subq = (
select(MistakeEvent.mistake_code)
.where(MistakeEvent.attempt_id == Attempt.id)
.limit(1)
.scalar_subquery()
)
stmt = (
select(
Attempt.id,
Problem.content,
Attempt.answer_given,
Attempt.is_correct,
Attempt.created_at,
Skill.code.label("skill_code"),
mistake_code_subq.label("mistake_code"),
)
.join(Problem, Attempt.problem_id == Problem.id)
.join(Skill, Problem.skill_id == Skill.id)
.where(Attempt.student_id == student_id)
.order_by(Attempt.created_at.desc())
.limit(page_size)
.offset((page - 1) * page_size)
)
We chose Option B. The LIMIT 1 inside the subquery is a hard guarantee: regardless of how many MistakeEvent rows exist for an attempt, the result is always one row per attempt. The database enforces the invariant, not application logic.
The trade-off is a correlated subquery executing once per result row — potentially slower than a join on large datasets. At Phase 1 volumes (< 10k attempts per student), the composite index on (student_id, created_at) keeps p99 well under 150ms. Cursor-based pagination and a flattened join are on the Phase 3 backlog.
Why It Matters for the Research
The MacLellan framework's Teacher-in-the-Loop principle requires teachers to have information they can act on. Phase 1 gave teachers where students are stuck (KC mastery states). Phase 2 gives them evidence — the actual attempt record.
A teacher who sees "Emma: SUB_BORROW at 12% Novice" now has a follow-up panel: "Emma attempted Q.47 (345 − 178), wrote 177, BORROW_SKIP classified." That's a different kind of information — it's not a model output, it's an event record. Teachers can corroborate or question the classifier's judgement. That auditability matters for our RCT design: we're not studying whether the model is right, we're studying whether teachers who can see this data intervene differently.
Phase 3 will layer LLM-generated insights on top of this history. But the raw record had to come first — insights without evidence are just confident assertions.
What We Learned
OFFSET pagination has a well-known flaw: page 2 of an ordered result can drift if new rows are inserted between requests. For attempt history in a classroom tool this is acceptable — a teacher paging through a student's history isn't doing real-time analytics. We documented the cursor-based upgrade path in the architecture notes rather than over-engineering for a problem we don't have yet.
The access control pattern here is simpler than Phase 1's dual-role endpoint. This route is teacher-only, so require_teacher covers it cleanly — no route-level role check needed:
@router.get("/students/{student_id}/attempts", response_model=AttemptsPageResponse)
async def get_student_attempts(
student_id: uuid.UUID,
page: int = Query(default=1, ge=1),
page_size: int = Query(default=10, ge=1, le=100),
db: AsyncSession = Depends(get_db),
_: dict = Depends(require_teacher),
) -> AttemptsPageResponse:
...
No require_authenticated + manual role check — the student version of this endpoint would be a separate route with its own access control. Keeping them apart is cleaner than a single endpoint that branches on role.
What's Next
Phase 3 adds a GenerateInsightUseCase — an LLM call that reads a student's attempt history and KC states, then produces a natural-language summary for the teacher. The attempt history table we built here is the input to that call.
Key Takeaways
-
Scalar correlated subquery > LEFT JOIN for nullable one-to-one —
LIMIT 1inside the subquery is a hard row-count guarantee that survives future schema changes; a LEFT JOIN can silently multiply rows if the cardinality assumption ever breaks - Raw evidence before generated insight — giving teachers the attempt record (what the student actually did) before generating LLM summaries (what the model thinks it means) keeps the teacher as the interpreter, not the model
- OFFSET pagination is fine until it isn't — document the cursor-based upgrade path in architecture notes and move on; don't prematurely optimise for data volumes that don't exist yet
Top comments (0)