Joshua Hall

Posted on Jun 10 • Originally published at yaplabs.com

CFS: Scoring Features Before You Argue About Them

#ux #uxdesign #productivity #product

Should we build two-factor authentication? Users have asked for it. That isn't a yes. Should we add the export-to-Excel feature the enterprise account keeps requesting, or the second product line a competitor just shipped? Every team faces a steady stream of these, and most answer them in a meeting where whoever holds the strongest opinion wins. Six months later the roadmap has a dozen features nobody uses, and a dozen more that got turned down for reasons no one can reconstruct.

The deeper problem isn't the decision. It's that the decision evaporates. It becomes an action item, a five-minute hallway conversation, a Slack thread, and it is almost never written down with the reasoning attached. So in a year, someone asks the same question, proposes the same feature you already killed, and nobody can say why it died. Maybe the answer should change now; maybe it shouldn't. Without a record you can't tell, and you can't improve the way you decide, because the inputs are gone. Organizational memory turns out to be however much a few people can hold in their heads (which is less than you'd hope and not nearly as long as you need) and that's before half the team turns over.

The fix isn't more meetings. It's a small piece of structure that gives the conversation an anchor: three axes, a one-to-five rating on each, and a score you write down with the decision. It's called CFS — Commonality, Frequency, Severity. It won't make the call for you. It makes the call comparable and durable.

The Three Axes

Commonality asks what share of your users would touch this feature at all. Are 80% of users going to export to Excel, or 20%? Is it one important enterprise client and nobody else? A one is a single niche audience. A five is universal: the kind of thing nearly everyone in the product needs. Fives are rare, and the rarity is the point. If everything scores a five, the axis has stopped telling you anything.

Frequency asks how often the people who do use it come back. Among that slice of users (whether it's 20% of the base or 95%) is this a couple-times-a-year thing or a twenty-times-a-day thing? Note that this is conditional on commonality: a feature can matter to only a sliver of users and still be something that sliver lives in daily.

Severity asks how much the absence hurts the people who'd benefit, and whether there's an easy way around it. The workaround question is doing most of the work here. Can they copy-paste manually? Lean on a shortcut key, or an OS capability common enough to assume everyone has it, like printing? That's low severity. At the other end: without this, the product is worthless to the audience that needs it. Not being able to export your books to Excel out of an accounting tool can be that. Not being able to print from a photo app can be that. Most things land somewhere between the convenience and the dealbreaker.

The three are independent. A feature can be common but rarely used, used constantly by a tiny group, painful to lack but only in one corner of the product. Keeping them separate is what lets the score carry information.

Multiply, Don't Add

Here's the part that matters and is easy to get wrong: the axes multiply. Commonality times Frequency times Severity, so a one-to-five scale tops out at 125, not 15.

Multiplication is deliberate, because each tick should land with more weight than the last — a jump from severity three to four isn't one unit more pain, it's a different category of pain, and the math should say so. It also means a single one anywhere drags the whole thing down, which is usually correct: a feature almost nobody can use, no matter how often or how critically, probably isn't where your next two weeks should go.

A shortcut key shows why the axes have to stay separate. I remap keys to split and merge browser windows constantly (high frequency, for me). But my web habits are geeky and unrepresentative; commonality is low. High frequency, low commonality, and the product is fine without it: the workaround is a button two pixels away. Multiply it out and the score stays honest about that.

Calibrate the top of the scale hard. A five means six-sigma certainty: everyone uses it, or the people who do use it dozens of times a day, or the product is simply broken without it. I rarely reach for fours and almost never fives. Most real scoring lives in the one-to-three band with the occasional four, and that's the framework working, not failing.

What the Scores Look Like in Practice

Take private accounts on a consumer social network. Authenticated, individual accounts are how the product works at all (high commonality, high severity, you don't have a social network without them). But that one decision fans out into a cluster of features whose scores diverge sharply. Password reset, lost-password flow, passkeys, emailed one-time codes, OTP: all roads to "I can get into my account," each scoring differently. And frequency is genuinely contextual. If I let a mobile session refresh silently for months, signing back in is rare even though the account itself is universal and critical. High commonality, high severity, low frequency, and the math holds all three at once.

Now account merging. You join a network with your Gmail, forget, and join again with Yahoo. Two accounts, two addresses. There are workflows to reconcile them, but it's an uncommon situation, an infrequent one, and it takes a savvy user to even notice. The system usually can't detect it without a reliable shared anchor like a verified phone number. And the workaround is brutal but real: just delete one account. Low commonality, low frequency, low-to-moderate severity; multiply it out and you get a small number, which is the right answer. A lot of products correctly never build this.

Printing sits in the messy middle, which is exactly why it's useful. Plenty of apps don't need it. But plenty of users still print, or print-to-PDF because a manager wants it emailed. Middling commonality, low-to-middling frequency, low-to-middling severity depending on the domain: a clean printable view is a respectable, unglamorous, middle-of-the-table feature. Not every decision is a clear build or a clear cut, and the score is honest about the ones that aren't.

One Input Among Many

CFS is not a prioritization engine. It's the benefit half of a cost-benefit, and it deliberately leaves cost out.

That's the line between CFS and something like RICE, the Intercom model that folds effort in as a divisor: Reach times Impact times Confidence, over Effort. RICE bakes the cost into the number. CFS doesn't, on purpose, because cost belongs in a separate, cleaner conversation. Effort is the easy thing to compare: put it in person-weeks and you're done. I've prioritized half a dozen ones and twos ahead of an eighteen plenty of times, simply because the small ones shipped in days while the big one needed two more weeks of design before engineering could even start.

The harder inputs resist a tidy number. Team load-balancing: the feature needs Susan and Javier, but Javier is booked solid for a quarter on something the executives flagged, and Susan won't surface from her two current projects for six weeks. That has nothing to do with the feature's merit and everything to do with whether you can staff it. Then there's strategic value a usage score will never capture. A feature that scores low on all three axes but demos beautifully and helps sales close can absolutely be worth building.

So treat the CFS number as one calibrated input you set beside cost, capacity, and strategy, not the verdict. As a rough read on whether something looks important before the harder conversations start, I've found nothing better. An eighteen on the board earns real attention. North of twenty-four, I'm usually building it or making a deliberate case for why not.

The Real Payoff Is Rigor

Here's what the score actually buys you, and it took me embarrassingly long to name it: it adds a pseudo-quantitative layer to a fundamentally qualitative judgment, and that structure does two things a meeting can't.

First, it lets the best idea win regardless of who has it. When the conversation is "how common, how frequent, how severe," it stops mattering whether the proposal came from the CEO or the quietest person in the room. You're rating the need, not the advocate. The strong-opinion-wins dynamic that runs most feature debates loses its grip, because everyone is now arguing about the same three things in the same terms.

Second, it forces people to check their own biases. If I walked in certain feature A beat feature B, and we score them and A comes out a nine while B comes out a twenty-four, I have to sit with that. Why did I think A mattered more? Is our read on B inflated, or was my conviction about A just ego, or familiarity, or whatever I carried into the room? The tension between gut and score is the most valuable thing CFS produces — not because the number is right and the gut is wrong, but because the gap is where the real conversation lives. That's also where you should introspect hardest: when half the room expected low and it came back high, the disagreement is pointing at a hidden assumption worth dragging into the light.

The number was never the deliverable. The deliverable is a room full of people who have stopped arguing about whose opinion is louder and started arguing about how much the absence actually costs — written down, with the reasoning attached, so that when the question comes back in a year, the answer is right there waiting. Commonality two, frequency one, severity two. Has anything changed? Usually nothing has. And when it has, you'll know exactly what.

DEV Community