DEV Community

Cover image for Beyond Accuracy: How ROC-AUC Reveals the True Power of Your Model
Akshay Shetty
Akshay Shetty

Posted on • Edited on

Beyond Accuracy: How ROC-AUC Reveals the True Power of Your Model

If you've ever built a classification model, you probably started by measuring its accuracy. But what happens when your data is imbalanced?

Example: Spam Detector

Imagine you’re building a spam detector.

  • Out of 100 emails in the dataset, 99 are not spam and only 1 is spam.
  • A naive model could just predict every email as “not spam” and still get 99% accuracy—but it fails to detect single spam email, so it never learns to recognize spam

  • Even if you feed it emails with strong spam-like features, it will still call them “not spam”

  • This isn’t really overfitting—it’s more of an imbalance issue (the model is biased toward the majority class)

Sample dataset for illustration:

Words_Caps Num_Links Email_Length Spam
5 2 120 1
3 0 80 1
0 1 200 0
1 0 150 0
7 3 95 1
0 0 300 0
1 0 250 0
2 1 180 0
6 4 110 1
0 0 220 0

Why Thresholds Matter

Most classification models don’t directly say “Yes” or “No.”
After you train a classifier (like logistic regression, random forest, XGBoost, etc.), when you call predict_proba or an equivalent function, the model gives probabilities for each class.

Words_Caps Num_Links Email_Length Spam Probability_score
5 2 120 1 0.6
3 0 80 1 0.3
0 1 200 0 0.6
1 0 150 0 0.3
7 3 95 1 0.8
0 0 300 0 0.5
1 0 250 0 0.3
2 1 180 0 0.6
6 4 110 1 0.9
0 0 220 0 0.2

Classification doesn’t directly predict Yes/No.

We have to set a threshold (default 0.5):

threshold = 0.5
predicted = [1 if p >= threshold else 0 for p in probabilities]

Enter fullscreen mode Exit fullscreen mode

At threshold = 0.5, the confusion matrix is:

Classification Report

That means:

  • True Positives: 3
  • False Negatives: 1
  • False Positives: 3
  • True Negatives: 3

To improve results, you’d have to keep changing the threshold from 0.0 to 1.0 and checking the confusion matrix each time.
But that’s messy and time-consuming.


Receiver Operating Characteristic (ROC)

Instead of testing thresholds manually, ROC does this for you.

For each threshold, we compute:

  • TPR (True Positive Rate / Recall) = TP / (TP + FN)
  • FPR (False Positive Rate) = FP / (FP + TN)

Then, we plot TPR vs FPR for all thresholds (from 0.0 → 1.0).

  • At threshold = 0.0, everything is predicted Yes, so we start at point (1,1).
  • At threshold = 1.0, everything is predicted No, so we end at point (0,0).

TPR VS FPR

  • In between, we get a curve that shows the trade-off between catching positives and avoiding false alarms.

The ROC curve helps identify thresholds that give you high TPR and low FPR.


Area Under the Curve (AUC)

Here’s the key part:

  • ROC → Helps visualize trade-offs and choose a threshold
  • AUC → A single number that measures how well the model separates classes independent of threshold

Interpretation:

  • AUC = 1.0 → Perfect separation
  • AUC = 0.5 → Random guessing
  • AUC < 0.5: A model worse than random guess

Think of AUC as a summary score of your model’s ranking ability.

It tells you how often your model ranks a real positive higher than a real negative.


ROC-AUC plot in Python using scikit-learn.


import pandas as pd
from sklearn.metrics import roc_curve, roc_auc_score
import matplotlib.pyplot as plt

# Compute ROC curve
fpr,tpr, thresholds=roc_curve(Spam ,Probability_score)

# Compute AUC
auc=roc_auc_score(Spam ,Probability_score)

print("AUC Score:",auc)

# Plot ROC curve
plt.figure(figsize=(6,6))
plt.plot(fpr,tpr,marker='o',label=f'ROC curve (AUC={auc:.2f})')
plt.plot([0, 1], [0, 1],linestyle='--', color='gray', label='Random Guessing')

plt.xlabel('False Positive Rate (FPR)')
plt.ylabel('True Positive Rate (TPR)')
plt.title('ROC Curve for Spam Detector')
plt.legend()
plt.grid(True)
plt.show()

Enter fullscreen mode Exit fullscreen mode

ROC

  • ROC alone doesn’t give a single threshold—it gives all possible thresholds.
  • The “best” threshold depends on your problem.
  • The ideal point on an ROC curve is the top-left corner (FPR=0, TPR=1), which represents a perfect classifier.
  • The point at (FPR ≈ 0.33, TPR = 0.75) looks like a strong candidate that catches most positives.
  • Imagine you absolutely cannot tolerate important emails going to spam. You want an FPR as close to 0 as possible. In this case, you'd choose the point at (FPR=0,TPR=0.5).
  • A common method is Youden’s J statistic: J = TPR - FPR . Pick the threshold that maximizes J, giving the best trade-off.

.

Top comments (0)