hqqqqy

Posted on Apr 3 • Originally published at mathisimple.com

Naive Bayes Explained: A 20-Patient Flu Diagnosis Example (with Math Derivation)

#machinelearning #statistics #tutorial #beginners

Naive Bayes Explained: A 20-Patient Flu Diagnosis Example (with Math Derivation)

Naive Bayes has a reputation for being both surprisingly simple and surprisingly useful.

🌐 This is a cross-post from my interactive tutorial site mathisimple.com, where every chart and diagram is fully interactive — drag sliders, adjust parameters, and see the math change in real time.

If you have a small symptom table, a handful of categories, and you need a transparent classifier instead of a black box, Naive Bayes is often the first model worth trying.

The name sounds intimidating, but the core idea is ordinary probability: start with how common each class is, then keep updating that belief with evidence.

A 20-Patient Flu Dataset

Suppose a clinic recorded 20 historical cases with four categorical features:

Temperature: normal, fever, high fever
Cough: none, mild, severe
Headache: yes or no
Fatigue: yes or no

The final diagnosis was:

Flu: 13 patients
Not Flu: 7 patients

Feature	Value	Count in Flu	Count in Not Flu
Temperature	normal	0	7
Temperature	fever	8	0
Temperature	high fever	5	0
Cough	none	0	4
Cough	mild	4	3
Cough	severe	9	0
Headache	yes	13	1
Headache	no	0	6
Fatigue	yes	11	1
Fatigue	no	2	6

The Bayes Rule Behind the Model

Naive Bayes estimates the posterior probability of each class:

P(y \mid x) = \frac{P(x \mid y) P(y)}{P(x)}

💡 The "Naive" Assumption
Instead of estimating one giant joint probability for all symptoms, the model assumes features are independent and multiplies individual conditional probabilities:

P(x \mid y) \approx P(x_1 \mid y) P(x_2 \mid y) P(x_3 \mid y) P(x_4 \mid y)

The key insight: we never compute the denominator P(x). Since it's the same for both classes, we just compare the numerator scores directly.

Step 1: Priors

Before seeing any symptoms, the prior class probabilities are:

P(Flu) = 13 / 20 = 0.65
P(Not Flu) = 7 / 20 = 0.35

Step 2: A New Patient

Now a new patient arrives with:

Temperature: fever
Cough: severe
Headache: yes
Fatigue: yes

Step 3: Laplace-Smoothed Likelihoods

🧪 Why Laplace Smoothing?
If we use raw counts, the "not flu" class gets zero probability because no non-flu patient in the training data had fever or severe cough. That would make the entire product zero — eliminating the class completely. Laplace smoothing adds 1 to each count to fix this.

Temperature and cough each have 3 categories. Headache and fatigue each have 2 categories.

Likelihood	P(· ∣ Flu)	P(· ∣ Not Flu)
Fever	(8+1)/(13+3) = 9/16	(0+1)/(7+3) = 1/10
Severe Cough	(9+1)/(13+3) = 10/16	(0+1)/(7+3) = 1/10
Headache = yes	(13+1)/(13+2) = 14/15	(1+1)/(7+2) = 2/9
Fatigue = yes	(11+1)/(13+2) = 12/15	(1+1)/(7+2) = 2/9

Step 4: Unnormalized Posterior Scores

Score(Flu) = \frac{13}{20} \times \frac{9}{16} \times \frac{10}{16} \times \frac{14}{15} \times \frac{12}{15} \approx 0.1706

Score(Not\;Flu) = \frac{7}{20} \times \frac{1}{10} \times \frac{1}{10} \times \frac{2}{9} \times \frac{2}{9} \approx 0.00017

The flu score is orders of magnitude larger, so the predicted class is Flu.

What This Example Teaches

Priors matter: common classes start with an advantage.
Likelihoods matter more: when certain symptoms are highly concentrated in one class.
Laplace smoothing matters: whenever rare combinations would otherwise create zero probabilities.
Interpretability is the big win: every prediction can be traced back to simple counts.

That's why Naive Bayes still shows up in text classification, email filtering, triage systems, and small tabular problems. It's fast, explainable, and often much more competitive than its "naive" label suggests.

Try the Interactive Version

The static diagrams in this article are fully interactive on mathisimple.com.

👉 Open the interactive tutorial → mathisimple.com

You can:

Explore the symptom tables and manipulate underlying feature counts
Instantly see how varying the prior probabilities alters the final prediction
Toggle Laplace smoothing to see its direct effect on the "zero-probability" problem

DEV Community

Naive Bayes Explained: A 20-Patient Flu Diagnosis Example (with Math Derivation)

Naive Bayes Explained: A 20-Patient Flu Diagnosis Example (with Math Derivation)

A 20-Patient Flu Dataset

The Bayes Rule Behind the Model

Step 1: Priors

Step 2: A New Patient

Step 3: Laplace-Smoothed Likelihoods

Step 4: Unnormalized Posterior Scores

What This Example Teaches

Try the Interactive Version

Top comments (0)