Chain Rule in Machine Learning: A Simple Walkthrough

#machinelearning

When exploring machine learning, there are many rules to learn. One important rule we should always keep in mind is the chain rule.

Just like what you imagine when hearing the word “chain”, you can think of a chain reaction—where changing one thing causes a change in a second thing, which then causes a change in a third.

The Core Concept: Link

Let’s take a shoe-size example.

Think of three measurements:

Weight
Height
Shoe size

From these measurements, Just assume the following:

Weight predicts height
Height predicts shoe size

If you want to know how much weight affects shoe size, the chain rule says you simply multiply the derivatives of these links together.

So, we can express this as:

Change in Shoe Size per Weight = (Change in Shoe Size per Height) × (Change in Height per Weight)

Plugging in Values

Let’s plug in some example values to understand this better.

Assume that for every 1 unit increase in weight, height increases by 2 units.

Change in height / Change in weight = 2 / 1 = 2

Now assume that for every 1 unit increase in height, shoe size increases by 1/4 unit.

Change in shoe size / Change in height = (1/4) / 1 = 1/4

According to the chain rule:

Change in shoe size per weight = 1/4 × 2 = 1/2

This means that for every 1 unit increase in weight, shoe size increases by 1/2 unit.

So, that is basically the chain rule.

Wrapping Up

The chain rule is a fundamental building block that prepares us for concepts like backpropagation.

In the next part of this article, we’ll move on to gradient descent.

If you’ve ever struggled with repetitive tasks, obscure commands, or debugging headaches, this platform is here to make your life easier. It’s free, open-source, and built with developers in mind.