DEV Community

Cover image for Understanding Reinforcement Learning with Neural Networks Part 5: Connecting Reward, Derivative, and Step Size
Rijul Rajesh
Rijul Rajesh

Posted on

Understanding Reinforcement Learning with Neural Networks Part 5: Connecting Reward, Derivative, and Step Size

In the previous article, we explored the reward system in reinforcement learning

In this article, we will begin calculating the step size.

First Update

In this example, the learning rate is 1.0.

So, the step size is 0.5.

Next, we update the bias by subtracting the step size from the old bias value 0.0:


After the Update

Now that the bias has been updated, we run the model again.

The new probability of going to Place B becomes 0.4.

This means the probability of going to Place A is:


Choosing Again

We now pick a random number between 0 and 1, and get 0.9.

Since 0.9 falls in the region representing Place B, we choose Place B.

Computing the Gradient Again

To update the bias, we again compute the derivative.

First, we assume that choosing Place B was the correct action.

So ideally:

Now we compute the difference between the ideal value 1.0 and the actual value 0.4.

Using this, we calculate the derivative with respect to the bias, which gives:


Checking the Reward

Now we check whether this was actually a good decision.

Place B gives a large portion of fries, but our hunger input is 0.0, meaning we are not very hungry.

So this was not a good choice.

Therefore, the reward is:

Reward = -1


Updating with Reward

We multiply the derivative by the reward:

-0.6 x -1 = 0.6

So the updated derivative becomes 0.6.

Second Step Update

Now we calculate the step size again:


Final Result

We plug the new bias back into the neural network.

Now the probability of going to Place B has decreased.

This means that when hunger is low, the model is more likely to choose Place A, which is the correct behavior.

This shows that the reinforcement learning algorithm, specifically policy gradients, is working as expected.


In the next article, we will explore how to further train the model using different input values.

Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.

Just run:

ipm install repo-name
Enter fullscreen mode Exit fullscreen mode

… and you’re done! πŸš€

Installerpedia Screenshot

πŸ”— Explore Installerpedia here

Top comments (0)