Abhijeet Pratap Singh

Posted on Jul 1

Decision Trees (Supervised Learning)

#algorithms #datascience #beginners #machinelearning

1. The Problem It Solves

Many real-world problems don't follow a straight-line relationship.

People don't make decisions by gradually increasing or decreasing something. Instead, they often make decisions based on conditions.

For example:

Will this customer upgrade?
Is this transaction fraudulent?
Should this loan be approved?
Will this machine fail?
Is this email spam?

The answer usually depends on a series of if-else rules, not a mathematical equation.

For example:

If monthly spending is greater than $500 and
Login frequency is less than twice a week and
Support tickets are increasing

then the customer is likely to churn.

Decision Trees are designed to discover these kinds of rules automatically.

Instead of fitting a line like Linear or Logistic Regression, they keep asking questions that split the data into smaller and more similar groups.

2. Core Intuition

Imagine you're playing 20 Questions.

You're trying to guess whether a customer will upgrade their subscription.

Instead of making one big guess, you ask simple Yes/No questions.

For example:

Does the customer have more than 20 seats?

If yes...

Ask another question.

Are API calls greater than 500 per day?

If yes...

Ask another question.

Has the account been active in the last week?

Eventually, you reach a point where almost every customer in that group behaves the same way.

That final group becomes a Leaf Node.

Whenever a new customer arrives, you simply walk them through the same set of questions until they reach a leaf.

The prediction is based on the majority of training examples that ended up there.

3. How the Algorithm Works

Decision Trees are built one split at a time.

At every node, the algorithm asks:

"Which question separates the data the best?"

It tries every feature.

Then every possible split point.

The split that creates the cleanest separation is chosen.

This process repeats until the stopping criteria are met.

4. Measuring Node Purity

To decide whether a split is good, the algorithm measures how "mixed" the classes are inside each node.

One of the most common metrics is Gini Impurity.

Where:

pᵢ = probability of class i
C = total number of classes

Interpretation:

Gini = 0 → Every sample belongs to one class (perfectly pure)
Higher values → Classes are mixed together

The goal is to make every leaf node as pure as possible.

5. Information Gain

Every possible split is evaluated.

The algorithm calculates how much impurity decreases after making that split.

This decrease is called Information Gain.

The split with the highest Information Gain becomes the next branch in the tree.

Then the entire process repeats recursively for each child node.

6. When Does the Tree Stop Growing?

If left alone, a Decision Tree keeps splitting until every training example has its own leaf.

That almost always leads to overfitting.

To prevent this, we usually limit tree growth using parameters like:

max_depth
min_samples_split
min_samples_leaf
max_leaf_nodes

These regularization settings help the tree generalize to unseen data instead of memorizing the training set.

7. When Should You Use Decision Trees?

Decision Trees work well when:

Relationships are non-linear.
Data contains many conditional rules.
Features are a mix of numerical and categorical values.
Interpretability is important.
You don't want extensive preprocessing.

Typical applications include:

Customer churn prediction
Credit approval
Fraud detection
Medical diagnosis
Product recommendation
Customer segmentation
Risk assessment

8. Advantages

Decision Trees have several practical benefits.

No feature scaling required.
Handles numerical and categorical data.
Learns non-linear relationships automatically.
Easy to visualize and explain.
Captures feature interactions naturally.
Works well even with missing values (depending on implementation).

9. When It Starts Breaking Down

Decision Trees are powerful, but they have some important weaknesses.

Overfitting

The biggest problem.

If the tree grows without limits, it starts memorizing the training data instead of learning real patterns.

This usually results in poor performance on new data.

High Variance

Decision Trees are unstable.

A small change in the training data can completely change the structure of the tree.

Two trees trained on almost identical datasets may look very different.

Greedy Decisions

The algorithm always chooses the best split right now.

It never looks ahead.

That means an early decision can prevent the tree from finding a better overall structure later.

Bias Toward Features with Many Split Points

Continuous numerical features often have many possible split locations.

Without proper controls, the algorithm may favor these features even when they aren't the most meaningful.

10. Python Implementation

import numpy as np
import pandas as pd

from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import export_text

from sklearn.metrics import accuracy_score

# Generate sample data
np.random.seed(42)

seat_count = np.random.uniform(1, 100, 100)
api_calls = np.random.uniform(10, 1000, 100)

# Business rule
upgraded = (
    (seat_count > 20) &
    (api_calls > 500)
).astype(int)

df = pd.DataFrame({
    "Seat_Count": seat_count,
    "API_Calls": api_calls,
    "Upgraded": upgraded
})

X = df[["Seat_Count", "API_Calls"]]
y = df["Upgraded"]

# Train Decision Tree
model = DecisionTreeClassifier(
    max_depth=3,
    random_state=42
)

model.fit(X, y)

# Predictions
predictions = model.predict(X)

print(
    "Accuracy:",
    accuracy_score(y, predictions)
)

print("\nDecision Rules\n")

print(
    export_text(
        model,
        feature_names=[
            "Seat_Count",
            "API_Calls"
        ]
    )
)

11. How to Evaluate the Model

Accuracy

Measures the percentage of correct predictions.

Useful when classes are balanced.

Precision

How many predicted positives were actually positive.

Recall

How many actual positive cases were correctly identified.

F1 Score

Balances Precision and Recall.

Useful for imbalanced datasets.

Tree Depth

A deeper tree isn't always better.

Very deep trees usually indicate overfitting.

Feature Importance

Decision Trees automatically estimate how useful each feature was during training.

This helps explain which variables influenced predictions the most.

12. Real-World Engineering Notes

Here are a few things you'll notice in production:

Decision Trees are one of the easiest ML models to explain to non-technical teams.
They require very little preprocessing.
Always limit tree growth using max_depth or min_samples_leaf.
A single Decision Tree rarely gives the best performance.
Most production systems use ensembles like Random Forest or Gradient Boosting because they reduce overfitting and improve accuracy.
Think of a Decision Tree as the building block for many of today's strongest machine learning algorithms.

13. Key Takeaways

Decision Trees solve classification and regression problems using a series of if-else rules.
They automatically discover non-linear relationships in data.
The algorithm chooses splits that maximize Information Gain and reduce impurity.
Easy to understand, visualize, and explain.
Requires little preprocessing and no feature scaling.
Can overfit easily if not regularized.
Forms the foundation of Random Forests, Extra Trees, XGBoost, LightGBM, and many other ensemble methods.

DEV Community

Decision Trees (Supervised Learning)

1. The Problem It Solves

2. Core Intuition

3. How the Algorithm Works

4. Measuring Node Purity

5. Information Gain

6. When Does the Tree Stop Growing?

7. When Should You Use Decision Trees?

8. Advantages

9. When It Starts Breaking Down

Overfitting

High Variance

Greedy Decisions

Bias Toward Features with Many Split Points

10. Python Implementation

11. How to Evaluate the Model

Accuracy

Precision

Recall

F1 Score

Tree Depth

Feature Importance

12. Real-World Engineering Notes

13. Key Takeaways

Top comments (0)