Ketis Dev

Posted on May 10

Building an AI-Powered Driving Theory Exam Platform with the FSRS Algorithm

#ai #algorithms #learning #showdev

When we set out to build KETis — an AI-powered platform that helps Lithuanians prepare for their driving theory exam (KET) — we faced a deceptively simple question: how do you teach 2,500+ exam questions across 17 license categories without burning students out?

The answer turned out to be an algorithm called FSRS (Free Spaced Repetition Scheduler). In this post, I'll walk through why we chose it over the classic SM-2 approach, what surprised us during implementation, and what 82% pass rates taught us about adaptive learning at scale.

The Problem with Traditional Study Apps

Most exam prep platforms — and almost every driving school app in Lithuania — use one of two approaches:

Static question pools. Random questions, no memory of what the user knows.
Linear progress. Question 1 → Question 2 → ... → Question 2,500. Brutal.

Both ignore a basic truth from cognitive science: the optimal moment to review something is right before you forget it. Review too early and you waste time. Review too late and you're effectively re-learning from scratch.

This is the core insight behind spaced repetition.

SM-2 vs FSRS: Why We Switched

The most famous spaced repetition algorithm is SM-2, popularized by SuperMemo and Anki in the 1980s. It works like this:

if quality >= 3:
    if repetitions == 0: interval = 1
    elif repetitions == 1: interval = 6
    else: interval = previous_interval * EF
    EF = EF + (0.1 - (5-q)*(0.08+(5-q)*0.02))
else:
    repetitions = 0
    interval = 1

It's elegant. It works. But it has problems:

Hard-coded constants were tuned by hand decades ago.
No personalization. It treats every user the same.
Binary failure mode. One wrong answer resets your progress.

FSRS, developed by Jarrett Ye starting in 2022, is fundamentally different. It models memory as a three-component system:

Difficulty (D) — how hard a card is for this user.
Stability (S) — how long the memory will last before recall probability drops.
Retrievability (R) — the current probability the user can recall the item.

The algorithm uses ~17 parameters that are fitted per-user from review history using gradient descent. The output is a recall probability curve:

R(t) = exp(ln(0.9) * t / S)

Where t is days since last review and S is the current stability. We schedule the next review when R drops to a target retention (we use 0.9).

What This Looks Like in Production

On KETis, every time a student answers a question, we update three things:

The card's stability S based on whether they answered correctly.
The card's difficulty D (smoothed exponential moving average).
The user-specific parameter set, retrained nightly on accumulated review data.

The result is a queue that adapts to the individual. A student who breezes through right-of-way questions but struggles with priority signs will see priority signs more often — but not so often that they burn out.

The Surprising Implementation Lessons

1. Cold start is real

FSRS needs ~50 reviews per user before its parameter estimates stabilize. For a brand-new user, you're essentially running on default parameters. We solved this by seeding new accounts with a category-level parameter set derived from the population, then gradually weighting toward the user's own history.

2. "Learning" ≠ "Reviewing"

FSRS is a review scheduler. It assumes the user has already encountered the material. For brand-new content, you need a separate "learning" phase with shorter intervals (we use 1m, 10m, 1d). Mixing the two states in one queue was our first big bug.

3. Database design matters more than the algorithm

We store every review event, not just the latest state. Storage is cheap; the ability to retrain parameters when the algorithm improves is priceless. We learned this the hard way after FSRS-4 → FSRS-5 came out and we had to re-derive parameters for every user.

4. UX trumps optimality

The "theoretically optimal" review schedule sometimes wants users to wait 47 days before seeing a card again. Users hate this — it feels like the app forgot about them. We cap intervals at 14 days for the first 3 months of use, then gradually let the algorithm breathe.

Results

After 18 months in production:

82% first-attempt pass rate on the official KET exam (national average is around 60%).
Median study time to exam-ready: 11 days.
2,500+ questions across 17 license categories, all scheduled per-user.

We're not claiming FSRS is magic. The platform also does adaptive question generation, mistake-pattern analysis, and weak-topic surfacing. But FSRS is the backbone — the thing that keeps the queue from becoming overwhelming or boring.

Should You Use FSRS?

If you're building any kind of repetitive learning product — language apps, exam prep, professional certification, medical board prep — yes. The open-source implementations are mature (there's a Python library, a Rust port, and a JavaScript version). The main cost is engineering discipline around event logging and parameter retraining.

If you're building a one-off course or content where users encounter material once and move on, stick with traditional sequencing.

Resources

FSRS algorithm paper and source code
KETis — our production deployment for Lithuanian driving theory exams

If you've shipped spaced repetition in production, I'd love to hear what tripped you up. Drop a comment below — especially if you found a better way to handle the cold-start problem.

DEV Community