Russian has 12 forms per noun. My SRS tracks each one separately

Thibaut Hennau — Fri, 15 May 2026 15:16:38 +0000

I built a spaced repetition system for Russian, and the first thing I had to throw out was the assumption that an SRS item is a word.

For Russian, that doesn't hold.

Take the word for "house": дом (dom). In a sentence, you might see any of these:

дом (dom) - nominative singular
дома (doma) - genitive singular
дому (domu) - dative singular
дом (dom) - accusative singular
домом (domom) - instrumental singular
доме (dome) - prepositional singular
дома (doma) - nominative plural
домов (domov) - genitive plural
домам (domam) - dative plural
дома (doma) - accusative plural
домами (domami) - instrumental plural
домах (domakh) - prepositional plural

12 forms. Some collide (дом appears twice; дома appears three times across different cases). Some are predictable from a paradigm, some aren't because of stress shift or fleeting vowels. And the catch: a learner who can recite the singular table cold might still freeze in the wild when they need the instrumental plural to say "with friends" (с друзьями, s druzyami).

Anki gives you the card "дом → house." You answer correctly. The card schedules forward.

Then a week later you misuse the genitive in a sentence, and your "дом" card is still green because, technically, you remembered the word. The form is what failed. The form is what needs the review.

So when I was building Slova, I changed the unit of memory from "lexeme" to "form."

The data model

Most SRS implementations look like this (simplified):

class Card:
    front: str            # "дом"
    back: str             # "house"
    ease: float           # 2.5
    interval_days: int    # next review
    due: datetime

Fine for languages where a word is mostly itself. English: "house," "houses," done. Spanish: a few verb conjugations, manageable.

Russian breaks this with combinatorics. 6 cases × 2 numbers × gender variations × stress-shift patterns for nouns. Verbs add aspect (perfective/imperfective pairs), prefixes, motion-verb directionality, and a separate past-tense system that agrees in gender. The matrix is large enough that one ease score per lexeme throws away most of the signal you have.

Here's what I ended up with:

class Lexeme:
    id: int
    headword: str         # "дом"
    pos: str              # "noun"
    gender: str           # "masc"

class Form:
    id: int
    lexeme_id: int
    surface: str          # "домом"
    case: str             # "instrumental"
    number: str           # "singular"

class Review:
    user_id: int
    form_id: int          # the unit of memory
    ease: float
    interval_days: int
    due: datetime
    last_outcome: str     # "correct" | "missed" | "wrong-case"

The unit of scheduling is the form, not the lexeme. Each form carries its own ease and interval. When a learner misses the instrumental of дом, that specific row gets pulled back. The nominative they nailed last week stays scheduled where it was.

This was the change that took the most rewriting and gave the biggest payoff.

Why per-form actually works in practice

Two reasons.

First, the failures cluster by case. A learner who's shaky on instrumental tends to be shaky across many nouns, not just one. Per-form tracking makes that visible. I can show a dashboard that says "your instrumental singular is at 60%" instead of "you missed 12 cards this week."

Second, the wins compound. Once a learner has stabilized the nominative across 200 nouns, those 200 form rows slide into long intervals and stop crowding the daily queue. The queue fills up with the cases they're actively learning. The old "one card per word" model kept pulling the nominative back in any time the genitive failed, which is wasted reps.

I didn't expect the second effect. The first one was the design goal. The second showed up in the data after a few weeks.

Exercise selection: the messy part

Per-form scheduling solves "what's due." It doesn't solve "what exercise."

A form has many ways it can be tested. For домом (domom) you can ask:

Translate "with the house" into Russian.
Fill the blank: "Он гордится ___." (He is proud of ___.)
Pick the right case for the verb's argument from a list.
Type the form when shown the headword and the case label.

Each tests something different. Exercise 1 leans on translation and active recall. Exercise 4 isolates the morphology. Exercise 2 puts the form in semantic context. They aren't interchangeable.

I pick the exercise type per review based on the form's recent history:

def pick_exercise(form: Form, history: list[Review]) -> ExerciseType:
    streak = current_correct_streak(history)
    last_failure = most_recent_failure(history)

    if streak == 0 or last_failure == "wrong-case":
        return ExerciseType.FORM_FROM_LABEL   # isolate the morphology
    if streak < 3:
        return ExerciseType.FILL_IN_CONTEXT   # rebuild in a sentence
    return ExerciseType.TRANSLATE_PHRASE      # active production

The ramp goes from low cognitive load (produce the form in isolation) to high cognitive load (produce a whole phrase). When something breaks, the next review drops back down. The form has to climb the ladder again. This was probably the second-biggest design change after the per-form unit.

What broke when I switched

A few things I didn't predict.

Queue size exploded. When the unit was a word, a daily review queue maxed out at maybe 30 items. With forms, it could balloon to 200+ for a learner who'd seen 50 nouns. I had to cap and prioritize: nominative first, then accusative (most common in early Russian input), then genitive. The other cases unlock progressively as the lower ones stabilize. Without that gating, the system felt like a punishment machine.

Spaced repetition math needed adjusting. SM-2 (the algorithm Anki uses) assumes independence between cards. Forms of the same lexeme aren't independent. If you can produce домом, you almost always can produce доме next time you see it. I added a small "sibling correctness" boost: when one form of a lexeme is answered correctly, related forms get a tiny ease bump. Not enough to skip review, just enough to stretch their interval. This kept the queue tractable without throwing away review fidelity.

Wrong-case errors needed their own bucket. A learner who types дома when the prompt wanted домов isn't blanking on the word. They're picking the wrong case. Treating that the same as "didn't know" was too punishing and gave noisy data. So last_outcome has three states, not two, and wrong-case errors route the form back into the morphology-isolating exercise type next time around. Different failure mode, different remediation.

What this costs

The honest answer: more data, more queries, more product surface.

A noun has 12 form rows. A verb has dozens (aspect pairs, tense, person, gender for past, imperatives, gerunds). For a 1,000-word vocab base you're looking at 15,000+ form rows before any reviews exist. Indexes on (user_id, due) and (user_id, form_id) matter. Postgres handles it fine at the scale I'm running; I checked the query plans.

The UI also has to expose this. A learner needs to understand why they're seeing домом again when they knew дом. The form view shows the case label, the singular/plural, and the recent history. Without that, the system feels arbitrary, and arbitrary kills retention faster than difficulty does.

When this is overkill

Don't build per-form SRS for English. Don't build it for languages where the morphology is small enough to fold into one card. Spanish verbs are right on the edge; I'd probably do per-tense, not per-form, if I were building for Spanish.

For Russian (and Polish, Czech, Finnish, any heavily inflected language) the per-form model pays for itself within the first few weeks. The signal you get about which cases are stuck is the actual product. Without it you're guessing about a learner whose curve looks fine on the surface but has a giant blind spot underneath.

Try it

If you've tried to learn Russian and bounced off Anki because the cards felt wrong, the live version of this is at slova.be. It's an A1→B1 trainer built around case-aware drilling and verb aspect pairing.

If you're building an SRS for another inflected language and want to compare scheduler shapes, my email's on the site.

DEV Community: Thibaut Hennau