DEV Community: Pavan Pothuganti

How Does Boosting Actually Learn from Mistakes?

Pavan Pothuganti — Sat, 04 Jul 2026 14:27:32 +0000

In my previous article, I realized something important.

Random Forest reduces variance.

But reducing variance doesn't eliminate every mistake.

That naturally led to another question.

If Boosting learns from mistakes, how does it actually do that?

Does it remember the wrong predictions?

Does it retrain the entire model?

Does it delete bad trees?

I had all of these questions.

The answer turned out to be much simpler than I expected.

Imagine You're Learning for an Exam

Suppose you write a mock test.

You answer 100 questions.

After checking the results, your teacher circles the questions you got wrong.

Now imagine your teacher says:

"Don't study everything again."

"Spend more time on the questions you missed."

That's exactly what Boosting tries to do.

It doesn't restart from scratch.

It focuses its attention on the difficult examples.

Step 1: Build the First Model

Boosting starts with a simple model.

That model makes predictions.

Some predictions are correct.

Some are wrong.

Nothing unusual so far.

Step 2: Identify the Difficult Examples

Instead of celebrating the correct predictions, Boosting asks:

"Where did I fail?"

Those wrongly predicted records become much more important.

They're no longer treated like ordinary training examples.

They receive extra attention.

You can think of them as being highlighted with a marker.

Step 3: Build Another Model

Now comes the interesting part.

The next model isn't trained to repeat the same work.

It's trained with greater emphasis on the examples the previous model struggled with.

Its goal is simple.

Not to replace the first model.

To improve it.

Step 4: Repeat Again

The second model still makes some mistakes.

Now a third model focuses on those remaining errors.

Then a fourth.

Then a fifth.

Every new model tries to improve what came before it.

Instead of creating independent experts, Boosting creates a team where every member learns from the previous member's experience.

Why This Is Different from Random Forest

Random Forest trains many trees independently.

None of them knows what the others predicted.

It's like asking 100 students to solve an exam without allowing them to discuss the answers.

Boosting is different.

Every new model studies the mistakes made by the previous one before it starts learning.

It's more like a teacher reviewing each student's paper before giving the next assignment.

The learning process becomes sequential rather than independent.

Does Boosting Memorize Mistakes?

This was another question I had.

Not exactly.

Boosting doesn't simply remember wrong predictions.

Instead, it changes the learning process so that difficult examples influence future models more strongly.

The objective isn't to memorize.

The objective is to gradually improve.

Why Isn't One Powerful Model Enough?

Because one model rarely captures every pattern perfectly.

Each model discovers part of the solution.

The next model fills some of the remaining gaps.

Over multiple iterations, the combined model becomes much stronger than any individual learner.

That's why Boosting is often described as turning many weak learners into one strong learner.

What Happens Next?

At this point, we understand the idea.

But another question naturally appears.

How does the algorithm decide which mistakes deserve more attention?

That's where AdaBoost enters the picture.

AdaBoost introduces a clever mechanism called sample weights, allowing difficult training examples to receive progressively more importance after every iteration.

We'll explore that in the next article.

Key Takeaway

Boosting doesn't build many models and hope that voting fixes everything.

It builds models one after another.

Each new model is influenced by the mistakes made by the previous models.

Instead of asking many independent experts for their opinions, Boosting creates a learning process where every new expert studies the errors of the last one before offering a better solution.

Why Do Decision Trees Have High Variance?

Pavan Pothuganti — Sat, 04 Jul 2026 14:24:44 +0000

Every Machine Learning course eventually says this:

"Decision Trees have high variance."

When I first heard that, I accepted it and moved on.

But later, I stopped and asked myself a simple question:

What does that actually mean?

Not the textbook definition.

What is the model really doing that makes everyone call it a "high variance" algorithm?

That question completely changed how I understood Decision Trees.

Imagine Building Two Decision Trees

Suppose you have a dataset with 10,000 customer records.

You train a Decision Tree.

Now imagine removing just a few hundred records and training the model again.

You might expect the new tree to look almost identical.

After all:

The algorithm is the same.
Most of the data is the same.
The problem hasn't changed.

Surprisingly, that's often not what happens.

The new tree may choose a different root feature.

Different splits.

Different branches.

Different predictions.

A tiny change in the training data can completely reshape the tree.

That isn't a bug.

It's the nature of Decision Trees.

But Why Does This Happen?

A Decision Tree builds itself one split at a time.

At every step, it asks:

"Which feature gives me the best split right now?"

Sometimes two features are almost equally good.

A small change in the training data can make Feature A slightly better than Feature B.

Once the root node changes, everything below it changes as well.

It's like taking a different road at the first intersection.

Even though the destination is the same, the entire journey becomes different.

One small decision near the top creates a completely different tree.

The Domino Effect

Think about a family tree.

If the first branch changes, every branch below it changes too.

Decision Trees behave in a similar way.

A different root node leads to different child nodes.

Different child nodes lead to different grandchildren.

One early decision affects the entire structure.

That's why even a small change in the data can produce a very different model.

Why Is That a Problem?

Imagine predicting whether a customer will buy a product.

You train one Decision Tree today.

Tomorrow, you collect a little more data and train it again.

Now the predictions change noticeably.

The model isn't stable.

It reacts strongly to changes in the training data.

That instability is exactly what machine learning calls high variance.

The issue isn't that Decision Trees are inaccurate.

The issue is that they're sensitive.

Does High Variance Mean Decision Trees Are Bad?

Not at all.

Decision Trees are powerful because they can learn complex patterns without requiring feature scaling or linear relationships.

The trade-off is that this flexibility makes them more likely to overfit the training data.

They're excellent learners.

Sometimes they're just a little too eager to memorize.

A Question That Naturally Follows

Once I understood why Decision Trees have high variance, another question came to mind.

If the problem is instability, why not train many Decision Trees instead of trusting just one?

That simple question led me to Bagging and, eventually, Random Forest.

And that's exactly where the next article begins.

Key Takeaway

A Decision Tree has high variance not because it is a poor algorithm, but because it is highly sensitive to the data it learns from.

Even a small change in the training data can produce a completely different tree.

Understanding that single idea makes it much easier to understand why Bagging and Random Forest were created.

If Random Forest Already Reduces Variance, Why Do We Still Need Boosting?

Pavan Pothuganti — Sat, 04 Jul 2026 14:22:26 +0000

After learning Decision Trees, I understood why they overfit.

After learning Bagging, I understood how training multiple trees makes predictions more stable.

After learning Random Forest, I thought I had reached the final destination.

Then I discovered another family of algorithms:

Boosting.

My immediate question was simple.

If Random Forest already solved the problem, why did researchers invent Boosting?

The answer completely changed how I think about machine learning models.

The Mistake I Was Making

I assumed reducing variance meant reducing errors.

Those sound similar.

They're not.

Reducing variance simply means making the model more stable.

It does not mean the model suddenly becomes perfect.

That distinction is easy to miss.

Imagine a Classroom

Suppose 100 students solve the same exam paper.

Instead of trusting one student, you decide to trust the majority.

If one student makes a silly mistake, the others correct it.

That's exactly what Random Forest does.

It replaces the opinion of one Decision Tree with the collective opinion of many trees.

Random mistakes become much less important.

But here's the interesting part.

What If Every Student Doesn't Know One Chapter?

Imagine every student skipped the same chapter before the exam.

Now everyone answers one question incorrectly.

Does asking 100 students help?

No.

The majority is still wrong.

This is exactly what can happen in Random Forest.

If every tree struggles with a particular pattern, majority voting cannot invent the correct answer.

The model has become more stable.

It hasn't become all-knowing.

Stability Isn't the Same as Learning

This was the biggest realization for me.

Random Forest mainly answers this question:

"How can we make predictions more consistent?"

Boosting answers a completely different question:

"How can we improve the mistakes that still remain?"

Those are not the same objective.

A Different Philosophy

Random Forest builds many trees independently.

Each tree finishes its work without knowing what the others predicted.

Boosting works differently.

It builds one model.

Then it studies where that model failed.

The next model is trained to pay more attention to those difficult cases.

When that model finishes, another model focuses on the remaining errors.

Instead of asking many models for independent opinions, Boosting creates a sequence of models where each one learns from the previous one.

It's more like coaching than voting.

Why Both Algorithms Exist

Random Forest is excellent when the main issue is instability.

Boosting is powerful when you want to squeeze out the remaining errors by continuously improving the model.

Neither algorithm replaces the other.

They solve different problems.

One focuses on stability.

The other focuses on improvement.

The Question That Changed My Understanding

I stopped asking:

"Which algorithm is better?"

Instead, I started asking:

"What problem is this algorithm trying to solve?"

That single question made ensemble learning much easier to understand.

Instead of memorizing algorithms, I began understanding the reason they exist.

And once I understood the reason, remembering the algorithms became effortless.

Key Takeaway

Random Forest reduces the randomness of Decision Trees.

Boosting reduces the mistakes that still remain after that randomness has been controlled.

One algorithm stabilizes learning.

The other continuously improves learning.

That difference is why both continue to be among the most important ensemble techniques in machine learning.

If Bagging Already Uses 100 Trees, Why Was Random Forest Invented?

Pavan Pothuganti — Sat, 04 Jul 2026 14:12:44 +0000

After finally understanding Bagging, I thought I was done with ensemble learning.

The idea made sense.

Take multiple bootstrap samples.

Train multiple Decision Trees.

Combine their predictions using majority voting.

Variance decreases.

Simple.

Then I came across another algorithm:

Random Forest.

My first reaction was honest:

"Wait... isn't this just Bagging with a fancy name?"

It turns out the answer is no.

And the reason is surprisingly interesting.

Bagging Solves One Problem

Bagging makes Decision Trees more stable.

Each tree is trained on a different bootstrap sample, so they don't all learn exactly the same data.

That reduces variance.

At this point, I assumed every tree would become completely different.

But that's not always true.

Different Data Doesn't Mean Different Thinking

Imagine you're building a model to predict house prices.

Your features are:

Location
Area
Number of bedrooms
Age of the house
Parking

Suppose Area is by far the strongest predictor.

Even though every tree receives a different bootstrap dataset, most of them will still discover the same thing:

"Area is the best feature to split on."

So what happens?

Tree after tree starts with the same root node.

Many of them grow in very similar ways.

They are trained on different data, but they still think alike.

Why Is That a Problem?

Imagine asking 100 people the same question.

If every person has exactly the same information and thinks in exactly the same way, you'll probably hear the same answer 100 times.

Even if that answer is wrong.

Now compare that with asking 100 experts from different backgrounds.

One notices something others missed.

Another approaches the problem differently.

The diversity of opinions often leads to a better final decision.

Random Forest tries to create that diversity.

The Extra Randomness

Bagging changes the rows.

Random Forest changes both the rows and the features.

Instead of allowing every tree to examine every feature, Random Forest randomly selects a subset of features whenever a split is made.

Now imagine the strongest feature isn't available for a particular split.

The tree is forced to explore another path.

One tree may begin with Area.

Another may begin with Location.

Another may start with Age.

The trees become less similar.

And that's exactly what we want.

Why Diversity Matters

If every tree makes the same mistake, majority voting cannot help.

If different trees make different mistakes, majority voting becomes much more powerful.

Random Forest doesn't just build more trees.

It builds more independent trees.

That small difference is what makes the algorithm so effective.

The Lesson That Changed My Perspective

For a long time, I thought Random Forest was simply:

"Bagging + a random trick."

Now I see it differently.

Bagging asks:

"How do we make Decision Trees more stable?"

Random Forest asks:

"How do we make those trees think differently?"

Those are two completely different questions.

And that's why Random Forest usually performs better than plain Bagging with Decision Trees.

Key Takeaway

Bagging creates multiple Decision Trees using different datasets.

Random Forest goes one step further by ensuring those trees don't all rely on the same features.

It's not about creating more trees.

It's about creating more diverse trees.

Sometimes, diversity is more valuable than quantity.

If Decision Trees Have High Variance, Why Does Bagging Actually Work?

Pavan Pothuganti — Sat, 04 Jul 2026 14:07:58 +0000

When I first learned about Decision Trees, everyone said the same thing:

"Decision Trees have high variance."

Then they immediately introduced Bagging and Random Forest.

At first, I accepted it.

Later, one question kept bothering me:

How does training 100 Decision Trees suddenly solve the problem?

After all, if one tree makes mistakes, why would building 99 more trees magically improve anything?

That question completely changed how I understood ensemble learning.

The Problem Isn't That Decision Trees Are "Bad"

Imagine training a Decision Tree on a dataset of 10,000 records.

Now remove just a small percentage of those records and train another tree.

Surprisingly, the new tree may have a completely different structure.

Different root node.

Different branches.

Different predictions.

The algorithm didn't change.

The problem didn't change.

Only a small part of the training data changed.

That is what people mean when they say a Decision Tree has high variance.

It reacts strongly to small changes in the training data.

My First Wrong Assumption

Initially I thought:

"If one tree is unstable, then training many unstable trees should make the situation even worse."

That sounds logical.

But that's not what actually happens.

The secret lies in how those trees are trained.

Every Tree Sees a Different World

Bagging doesn't clone the same Decision Tree 100 times.

Instead, it creates multiple bootstrap datasets.

Each dataset contains mostly the same records, but not exactly the same ones.

Every tree learns from a slightly different version of reality.

As a result:

One tree may overfit one noisy pattern.
Another tree may never even see that noisy pattern.
A third tree may split the data in a completely different way.

Each tree develops its own strengths and weaknesses.

The Real Power Isn't the Trees

The real power is the disagreement between them.

Suppose you're trying to classify an image.

Tree 1 predicts Cat.

Tree 2 predicts Cat.

Tree 3 predicts Dog.

Tree 4 predicts Cat.

Tree 5 predicts Cat.

One tree made a mistake.

Four didn't.

Instead of trusting one unstable model, Bagging trusts the collective decision.

The random mistakes made by individual trees are often cancelled out by the majority.

That is why variance decreases.

The Question That Came Next

After understanding this, another question immediately came to mind.

"What if all 100 trees are wrong?"

And the answer surprised me.

Yes, it can happen.

Imagine the training data itself is missing an important feature.

Or the labels contain systematic errors.

Every bootstrap sample is created from that same dataset.

Every tree learns the same incorrect pattern.

Now every tree confidently makes the same wrong prediction.

Voting cannot fix missing knowledge.

Bagging only reduces random instability.

It cannot magically invent information that doesn't exist.

That's When Everything Clicked

I realized Bagging doesn't promise perfection.

It promises stability.

A single Decision Tree may change dramatically when the training data changes.

Bagging makes the final prediction much more consistent by combining many different trees.

Some mistakes disappear because they were caused by randomness.

Other mistakes remain because they come from genuinely difficult patterns in the data.

Those remaining mistakes eventually led researchers to develop another family of algorithms called Boosting, which takes a completely different approach.

But that's a story for the next article.

Key Takeaway

I stopped thinking of Bagging as "100 Decision Trees."

Now I think of it as:

A method that replaces the opinion of one unstable model with the collective wisdom of many independently trained models.

That single idea made ensemble learning much easier to understand.