Understanding Reinforcement Learning with Neural Networks Part 4: Positive and Negative Rewards

#machinelearning #ai

In the previous article, we began the process of guessing the ideal output.

Let us continue with the same example.

Suppose we receive a small number of fries.

Since our hunger level is 0, this is actually a good outcome.

In this case, we should assign a reward of 1.

Now consider the opposite situation.

Suppose we receive a large order of fries.

Since we are not hungry enough to eat all the fries, this means we made a poor decision.

In that case, we assign a reward of -1.

In general:

Any positive reward indicates a good decision
Any negative reward indicates a bad decision

Updating the Derivative with the Reward

We now use this reward to update the derivative.

To do this, we simply multiply the derivative by the reward.

Case 1: Correct Decision

If the reward is 1, then:

The derivative remains unchanged.

This means the derivative is already pointing in the correct direction.

Case 2: Incorrect Decision

If the reward is -1, then:

Now the derivative changes sign.

This causes the optimization process to move the bias in the opposite direction.

In other words, the negative reward flips the direction of the update so the neural network can learn from the bad decision.

In the next article, we will explore how to calculate the step size for updating the parameters.

Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.

Just run: