DEV Community

Cover image for The Most Dangerous Number in Machine Learning: Accuracy
Siddhartha Reddy
Siddhartha Reddy

Posted on

The Most Dangerous Number in Machine Learning: Accuracy

Accuracy is often the first metric people learn in machine learning.

Train a model.
Evaluate it.
See a number like:
Accuracy: 95%

At first glance, that looks excellent. A model that is correct 95% of the time must be good.
But in many real-world problems, accuracy can be the most misleading number in the entire pipeline.
Sometimes, a model with 95% accuracy is completely useless.

What Accuracy Actually Measures

Accuracy is defined as:

Accuracy = Correct Predictions / Total Predictions
Enter fullscreen mode Exit fullscreen mode

In code, it often looks like this:

from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_true, y_pred)
Enter fullscreen mode Exit fullscreen mode

It simply measures the fraction of predictions that match the true labels.
The problem is that this number does not tell you what kinds of
mistakes the model makes.
And in many applications, those mistakes matter far more than the total percentage.

The Classic Example: Imbalanced Data

Imagine you are building a model to detect fraud in financial transactions.

Out of 10,000 transactions:
Fraudulent: 100
Legitimate: 9,900

Fraud represents only 1% of the data.
Now consider a model that predicts:
"Legitimate" for every transaction

This model never detects fraud.
But its accuracy would be:
9,900 / 10,000 = 99% accuracy
A model that misses every fraud case looks nearly perfect by accuracy alone.
In practice, it is useless.

Accuracy Hides the Type of Errors

In many applications, different mistakes have very different costs.
Consider medical diagnosis.
Two types of errors exist:

  • False positives: predicting disease when none exists
  • False negatives: missing a real disease A false negative might delay treatment for a serious condition. But accuracy treats all mistakes the same. It does not distinguish which mistakes are dangerous.

The Confusion Matrix Tells the Real Story

Instead of relying on accuracy alone, we need to look at the confusion matrix.
Example:

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_true, y_pred)
print(cm)
Enter fullscreen mode Exit fullscreen mode

This shows:

True Positive
False Positive
True Negative
False Negative
Enter fullscreen mode Exit fullscreen mode

These numbers reveal what accuracy hides.
You can see exactly how the model fails.

Better Metrics for Real Problems

Many tasks require metrics that capture different aspects of performance.
Common alternatives include:

Precision

Measures how many predicted positives are correct.

Precision = True Positives / (True Positives + False Positives)
Enter fullscreen mode Exit fullscreen mode

Important when false alarms are costly.

Recall

Measures how many real positives are detected.

Recall = True Positives / (True Positives + False Negatives)
Enter fullscreen mode Exit fullscreen mode

Important when missing cases is dangerous.

F1 Score

Balances precision and recall.

F1 = 2 × (Precision × Recall) / (Precision + Recall)
Enter fullscreen mode Exit fullscreen mode

Useful when classes are imbalanced.

ROC-AUC

Evaluates how well the model separates classes across thresholds.
Often more informative than accuracy in classification tasks.

Accuracy Still Has Its Place
Accuracy is not useless.

It works well when:

  • classes are balanced
  • the cost of errors is similar
  • the problem is symmetric But those conditions are surprisingly rare in real-world ML.

Why This Matters

Accuracy is dangerous not because it is wrong, but because it looks authoritative.
It gives a single clean number.
But machine learning performance is rarely a single-number problem.
If we optimize the wrong metric, we may build models that look good in evaluation and fail in practice.

Final Thought

Accuracy answers one question:

How often is the model correct?
Enter fullscreen mode Exit fullscreen mode

But in many real systems, the better question is:

What kinds of mistakes can we afford?
Enter fullscreen mode Exit fullscreen mode

Until that question is answered, accuracy alone can be the most dangerous number in machine learning.

Top comments (0)