Ruhid Ibadli

Posted on Jun 13

When did I last actually do this? Modeling skill decay as data

#career #productivity #learning #python

I kept a 140-day streak on a language-learning app. At the end of it, I could not hold a basic conversation.

The streak was not lying about effort. I really did open the app 140 days in a row. It was lying about the thing I actually cared about: whether the skill was getting better, holding steady, or quietly rotting. The number went up. The skill did not.

That gap — between engagement and retention — is what I want to talk about. And I want to be fair to streaks, because the lazy version of this argument is a strawman.

Streaks are good at exactly one thing

Streaks, XP, badges, and levels are not stupid. They are well-engineered solutions to a real problem: getting you to come back. Habit formation is hard, and a visible chain you don't want to break is a genuinely effective nudge. For the first few weeks of a new habit, gamification can be the scaffolding that gets you over the hump.

So let me be clear about what I'm not claiming. I'm not claiming streaks are useless or that everyone who likes them is being fooled. They solve the consistency problem, and consistency is real.

The problem is what happens after you've shown up. The instrument that got you in the door starts optimizing for the wrong thing.

What the streak is actually measuring

A streak measures one binary per day: did you do the minimum action? That's it. It is a proxy, and like all proxies, it drifts from the real target the moment you start optimizing for the proxy itself.

Once a streak has value to you, the rational move is to protect it as cheaply as possible. One flashcard at 11:58pm. The easiest lesson. The five-minute "review" that re-touches things you already know. The streak stays green. And none of it answers the one question that matters: am I actually getting better, or just keeping a record of attendance?

Worse, the green number feels like progress, so it suppresses the signal that should be alarming you. A 140-day streak hides skill decay because the dashboard is bright green — loud and reassuring in exactly the situation where you most need it to be quiet and honest.

This isn't unique to language apps. We have decent instruments for almost everything else in our working lives. Test coverage tells you how exposed your code is. Your IDE underlines the bug. git blame tells you who to ask. CI tells you the moment something breaks. For the state of our code, we have gauges everywhere.

For the state of our skills — the actual asset we rent out for a salary — we have nothing. No needle dropping into the red. You discover the decay the day you open a repo in a language you "used to know," and the feedback arrives months late, disguised as a personal shortcoming.

A calmer instrument: a mirror

So here's the alternative I keep coming back to. Instead of an instrument that nags and rewards and manufactures dopamine, what if you had one that just told you the truth and then shut up?

Not "great job, 7-day streak!" Not a red exclamation badge. Just: this skill was last practiced 19 days ago, its freshness is 68%, and you've read four articles about it without writing a line of code. No judgment. No confetti.

A streak counter punishes you for a sick day and bribes you with a number that has nothing to do with whether you can still write the code. I didn't want a slot machine. I wanted a bathroom scale — a flat, honest reading you can choose to act on or ignore. A mirror, not a coach.

To build a mirror like that, you need to model the thing streaks refuse to model: decay.

The forgetting curve is not a metaphor

In the 1880s, Hermann Ebbinghaus spent years memorizing nonsense syllables and measuring how fast he lost them. The result was the forgetting curve: retention drops sharply soon after learning, then levels off. You don't forget linearly. You lose a lot fast, and the remainder fades slowly.

The shape is roughly exponential — and that detail is the whole reason the model below works. Ebbinghaus also found the antidote, and it isn't "try harder." Two effects do the heavy lifting: the spacing effect (reviews at intervals beat cramming) and retrieval practice (recalling something from memory strengthens it far more than re-reading it). Recognition is cheap. Reconstruction is what sticks.

Modeling skill decay as data

Most tracking tools log one undifferentiated event: "studied today." That's the streak data model, and it's too poor to ever show decay. To do better, you need two ideas.

1. Input is not output

Split every event into one of two kinds:

Learning events (INPUT): reading, watching a video, a course, skimming docs, following a tutorial. Consuming.
Practice events (OUTPUT): doing an exercise, building a project, shipping work, teaching someone, writing about it. Producing.

This distinction is the whole game. I can spend six hours watching React talks and end the week less able to build a React app than someone who spent one hour building one, because I confused familiarity with ability. Any model that scores those two activities equally will lie to you the same way a streak does.

class EventType(str, Enum):
    LEARNING = "learning"   # input:  read, watch, course, article, docs
    PRACTICE = "practice"   # output: exercise, project, work, teach, build

Two columns, not one. Collapse them now and you can never reconstruct the distinction later — and the distinction is the point.

2. Anchor decay to your last practice, not your last touch

A skill fades when you stop using it, not when you stop reading about it. So the clock that matters is days since your last practice event:

def days_since_last_practice(events, now):
    practices = [e for e in events if e.type == EventType.PRACTICE]
    if not practices:
        return (now - skill_started_at(events)).days  # never practiced → decaying from day one
    last = max(p.occurred_at for p in practices)
    return (now - last).days

Note the edge case in the comment. A skill you learned but never practiced doesn't sit at 100 — it starts decaying from the day you began tracking it. The tutorial you watched in January and never applied is not a skill you have. It's a skill you met once.

3. Decay exponentially, then let learning slow the bleed

Following Ebbinghaus, decay is exponential — multiply remaining freshness by a constant just under 1 each idle day. Then add a small, capped boost for recent learning: enough to mean "I refreshed the theory," never enough to fake mastery you haven't earned.

def freshness(days_since_practice, learning_events_30d, decay_rate=0.98):
    # 1. Start from full mastery.
    score = 100.0

    # 2. Exponential decay anchored to the LAST PRACTICE.
    #    ~2%/day by default; tune decay_rate per skill.
    score *= decay_rate ** days_since_practice

    # 3. Small, capped learning boost. Reading slows the bleed;
    #    it does not restore what doing would. +2% each, max +15%.
    boost = min(learning_events_30d * 2, 15)
    score += boost

    # 4. Keep it honest: clamp to 0–100.
    return max(0.0, min(100.0, score))

At ~2% loss per idle day, the behavior feels right:

1 week unused → ~87%. A little dusty.
1 month → ~55%. You'd need to warm up.
3 months → ~16%. Functionally, you'd be relearning.

A few things worth noticing in that formula:

It can go down even when you're "active." If all your recent events are learning events, days_since_practice keeps climbing and the score keeps falling, capped boost notwithstanding. A streak would be green; this would be dropping. That divergence is the entire point.
The decay rate is per-skill. Your spoken French rots faster than your touch-typing. A muscle-memory skill might use 0.995; something fragile and conceptual might use 0.95. One global constant would be its own little lie.
The boost is capped on purpose. Without it, you could grind fifty articles in a weekend and "max out" a skill you've never once applied — the model would call you an expert at something you can't do.
The clamp isn't ceremony. The boost can push a freshly-practiced skill over 100, and a long-idle one needs a floor. Exponential decay also means it never quite hits zero — which is honest. You never fully forget how to ride the bike; you just get slow and wobbly.

Three truths the model can now surface

Once your data separates input from output and decays from last practice, a few honest signals fall out almost for free:

Learning decay — skills degrade without reinforcement. The freshness number is this.
Practice scarcity — learned-but-never-practiced skills, anything with no practice in 21+ days, or a "theory-heavy" flag when your learning:practice ratio climbs past 5:1.
Input/output imbalance — a simple balance = practice_count / learning_count. Above 1.0 means you produce more than you consume, which is where retention actually lives. Most of us live well below it, and a streak will never tell us.

One more, almost as a bonus: overlay total hours logged against current freshness. The skills with high hours and low freshness are the ones you earned and let slip — and the most worth reviving, because relearning is faster than learning the first time.

None of this needs AI or a recommendation engine. It's arithmetic over two columns and a date. The hard part was never the math — it was deciding to write the events down and being honest about which kind each one was.

I built the mirror version

I wanted this for myself, so I built it: SkillFade — a small side project whose tagline is "a mirror, not a coach." You log learning and practice events per skill; it computes the freshness score above, flags theory-heavy skills and practice droughts, and overlays hours-invested against freshness so you can see when you've poured time into something that's decaying anyway.

What it deliberately does not have: points, badges, streaks, levels, leaderboards, or AI telling you what to do next. The moment you add a streak, the goal quietly changes from "be honest about my skills" to "don't break the streak." It does not nag — alerts are plain-text email, at most about one a week, and you can turn them off. It's a FastAPI + React/TypeScript stack on Postgres, the core is MIT-licensed and self-hostable, and your data stays yours: no in-app tracking, full JSON export anytime, one-click deletion. Free for a few skills; the rest is optional.

I'm telling you I built it so you can weigh the bias accordingly. The model is maybe forty lines of Python and it stands on its own whether you use my thing or write twenty lines in a notebook. (I, for the record, still haven't fixed my bash skills. But at least the dashboard is honest about it.)

The honest takeaway

Your streak is a measure of attendance. That's worth something — showing up is the prerequisite for everything else. But don't confuse it with skill, and don't let a wall of green numbers reassure you out of noticing that something you used to be good at is slipping.

The reframe that actually changed how I work was small but total: stop measuring what you've learned, start measuring what you've produced. You close the tab and go build the small ugly thing instead. The screen is blank for a minute — and the blank screen is where retention is actually made.

Build it yourself or use SkillFade. Either way, the useful question was never "how many days in a row?" It was "when did I last actually do this, and how much of it is still in my hands?"

That question doesn't fit on a badge. That's exactly why it's worth tracking.

DEV Community