Pavan Pothuganti

Posted on Jul 4

If Decision Trees Have High Variance, Why Does Bagging Actually Work?

#machinelearning #decisiontree #bagging #highvariance

When I first learned about Decision Trees, everyone said the same thing:

"Decision Trees have high variance."

Then they immediately introduced Bagging and Random Forest.

At first, I accepted it.

Later, one question kept bothering me:

How does training 100 Decision Trees suddenly solve the problem?

After all, if one tree makes mistakes, why would building 99 more trees magically improve anything?

That question completely changed how I understood ensemble learning.

The Problem Isn't That Decision Trees Are "Bad"

Imagine training a Decision Tree on a dataset of 10,000 records.

Now remove just a small percentage of those records and train another tree.

Surprisingly, the new tree may have a completely different structure.

Different root node.

Different branches.

Different predictions.

The algorithm didn't change.

The problem didn't change.

Only a small part of the training data changed.

That is what people mean when they say a Decision Tree has high variance.

It reacts strongly to small changes in the training data.

My First Wrong Assumption

Initially I thought:

"If one tree is unstable, then training many unstable trees should make the situation even worse."

That sounds logical.

But that's not what actually happens.

The secret lies in how those trees are trained.

Every Tree Sees a Different World

Bagging doesn't clone the same Decision Tree 100 times.

Instead, it creates multiple bootstrap datasets.

Each dataset contains mostly the same records, but not exactly the same ones.

Every tree learns from a slightly different version of reality.

As a result:

One tree may overfit one noisy pattern.
Another tree may never even see that noisy pattern.
A third tree may split the data in a completely different way.

Each tree develops its own strengths and weaknesses.

The Real Power Isn't the Trees

The real power is the disagreement between them.

Suppose you're trying to classify an image.

Tree 1 predicts Cat.

Tree 2 predicts Cat.

Tree 3 predicts Dog.

Tree 4 predicts Cat.

Tree 5 predicts Cat.

One tree made a mistake.

Four didn't.

Instead of trusting one unstable model, Bagging trusts the collective decision.

The random mistakes made by individual trees are often cancelled out by the majority.

That is why variance decreases.

The Question That Came Next

After understanding this, another question immediately came to mind.

"What if all 100 trees are wrong?"

And the answer surprised me.

Yes, it can happen.

Imagine the training data itself is missing an important feature.

Or the labels contain systematic errors.

Every bootstrap sample is created from that same dataset.

Every tree learns the same incorrect pattern.

Now every tree confidently makes the same wrong prediction.

Voting cannot fix missing knowledge.

Bagging only reduces random instability.

It cannot magically invent information that doesn't exist.

That's When Everything Clicked

I realized Bagging doesn't promise perfection.

It promises stability.

A single Decision Tree may change dramatically when the training data changes.

Bagging makes the final prediction much more consistent by combining many different trees.

Some mistakes disappear because they were caused by randomness.

Other mistakes remain because they come from genuinely difficult patterns in the data.

Those remaining mistakes eventually led researchers to develop another family of algorithms called Boosting, which takes a completely different approach.

But that's a story for the next article.

Key Takeaway

I stopped thinking of Bagging as "100 Decision Trees."

Now I think of it as:

A method that replaces the opinion of one unstable model with the collective wisdom of many independently trained models.

That single idea made ensemble learning much easier to understand.

DEV Community