DEV Community

Cover image for Understanding Reinforcement Learning with Neural Networks Part 2: Why Backpropagation Is Not Enough
Rijul Rajesh
Rijul Rajesh

Posted on

Understanding Reinforcement Learning with Neural Networks Part 2: Why Backpropagation Is Not Enough

In the previous article, we explored an example where reinforcement learning is required and standard methods do not work.

In this article, we will understand why policy gradients are needed, and why the standard backpropagation method does not work in certain situations.

How Backpropagation Normally Works

Assume we have the following training data, where the desired outputs are already known:

Input (Hunger) Output p(B)
0.0 0
1.0 1
0.1 0
0.9 1

With this data, we can feed the input values into the neural network one at a time.

The neural network produces an output, and we compare it with the ideal output value from the training data.

Using this difference, we can measure how wrong the network is.


Using Derivatives to Update the Bias

We can calculate these differences for different values of the bias and visualize how the error changes as the bias changes.

From this graph, we can calculate the derivative.

  • If the derivative is negative, we shift the bias to the right
  • If the derivative is positive, we shift the bias to the left

The derivative correctly tells us which direction to move because the training data already contains the ideal output values.

This is the basic idea behind backpropagation.


The Problem in Reinforcement Learning

However, in reinforcement learning, we do not know the ideal output values in advance.

For example, we do not already know whether choosing Place A or Place B is the correct action.

Because of this:

  • we cannot calculate the difference between the neural networkโ€™s output and the ideal output
  • without these differences, we cannot calculate derivatives in the normal way

A Different Approach

Instead, we can guess what the ideal outputs should be and use those guesses to estimate the derivatives.

This idea forms the foundation of policy gradients in reinforcement learning.

In the next article, we will explore how reinforcement learning and policy gradients help us solve this problem.


Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.

Just run:

ipm install repo-name
Enter fullscreen mode Exit fullscreen mode

โ€ฆ and youโ€™re done! ๐Ÿš€

Installerpedia Screenshot

๐Ÿ”— Explore Installerpedia here

Top comments (0)