If 99% of emails are ham, a model that labels EVERYTHING ham scores 99% accuracy — and catches zero spam. On any imbalanced problem, accuracy lies. You need precision, recall, and F1. Here's a live confusion matrix you can drive with a threshold.
📏 Slide the threshold: https://dev48v.infy.uk/ml/day9-metrics.html
The four outcomes (confusion matrix)
Every prediction is a True Positive (caught it), False Positive (false alarm), False Negative (missed it), or True Negative. These four counts give you everything:
const precision = TP / (TP + FP); // of what I flagged, how much was right? (few false alarms)
const recall = TP / (TP + FN); // of all the real ones, how many did I catch? (few misses)
They trade off
Raise the threshold → fewer flags, surer ones (precision↑) but you miss more (recall↓). Lower it → the reverse. There's no free lunch — you tune the threshold to whichever error is costlier (a missed tumour vs a false fraud accusation).
F1 balances both
const f1 = 2 * precision * recall / (precision + recall);
It's the harmonic mean, so it punishes lopsidedness: 0.9 precision with 0.1 recall gives F1 ≈ 0.18, not 0.5. It only rises when both are good.
The takeaway
Four counts → precision, recall, F1 — the honest scorecard. Report all three, never accuracy alone. Drive the threshold.
Top comments (0)