In the previous article, we explored the reward system in reinforcement learning
In this article, we will begin calculating the step size.
First Update
In this example, the learning rate is 1.0.
So, the step size is 0.5.
Next, we update the bias by subtracting the step size from the old bias value 0.0:
After the Update
Now that the bias has been updated, we run the model again.
The new probability of going to Place B becomes 0.4.
This means the probability of going to Place A is:
Choosing Again
We now pick a random number between 0 and 1, and get 0.9.
Since 0.9 falls in the region representing Place B, we choose Place B.
Computing the Gradient Again
To update the bias, we again compute the derivative.
First, we assume that choosing Place B was the correct action.
So ideally:
Now we compute the difference between the ideal value 1.0 and the actual value 0.4.
Using this, we calculate the derivative with respect to the bias, which gives:
Checking the Reward
Now we check whether this was actually a good decision.
Place B gives a large portion of fries, but our hunger input is 0.0, meaning we are not very hungry.
So this was not a good choice.
Therefore, the reward is:
Reward = -1
Updating with Reward
We multiply the derivative by the reward:
-0.6 x -1 = 0.6
So the updated derivative becomes 0.6.
Second Step Update
Now we calculate the step size again:
Final Result
We plug the new bias back into the neural network.
Now the probability of going to Place B has decreased.
This means that when hunger is low, the model is more likely to choose Place A, which is the correct behavior.
This shows that the reinforcement learning algorithm, specifically policy gradients, is working as expected.
In the next article, we will explore how to further train the model using different input values.
Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.
Just run:
ipm install repo-name
β¦ and youβre done! π












Top comments (0)