DEV Community

Devanshu Biswas
Devanshu Biswas

Posted on

Why Accuracy Lies: Precision, Recall and F1 From Scratch

If 99% of emails are ham, a model that labels EVERYTHING ham scores 99% accuracy — and catches zero spam. On any imbalanced problem, accuracy lies. You need precision, recall, and F1. Here's a live confusion matrix you can drive with a threshold.

📏 Slide the threshold: https://dev48v.infy.uk/ml/day9-metrics.html

The four outcomes (confusion matrix)

Every prediction is a True Positive (caught it), False Positive (false alarm), False Negative (missed it), or True Negative. These four counts give you everything:

const precision = TP / (TP + FP);   // of what I flagged, how much was right? (few false alarms)
const recall    = TP / (TP + FN);   // of all the real ones, how many did I catch? (few misses)
Enter fullscreen mode Exit fullscreen mode

They trade off

Raise the threshold → fewer flags, surer ones (precision↑) but you miss more (recall↓). Lower it → the reverse. There's no free lunch — you tune the threshold to whichever error is costlier (a missed tumour vs a false fraud accusation).

F1 balances both

const f1 = 2 * precision * recall / (precision + recall);
Enter fullscreen mode Exit fullscreen mode

It's the harmonic mean, so it punishes lopsidedness: 0.9 precision with 0.1 recall gives F1 ≈ 0.18, not 0.5. It only rises when both are good.

The takeaway

Four counts → precision, recall, F1 — the honest scorecard. Report all three, never accuracy alone. Drive the threshold.

Top comments (0)