DEV Community

Cover image for Understanding Reinforcement Learning with Neural Networks Part 3: Guessing the Ideal Output
Rijul Rajesh
Rijul Rajesh

Posted on

Understanding Reinforcement Learning with Neural Networks Part 3: Guessing the Ideal Output

In the previous article, we explored the limitations of backpropagation and why it is not ideal when the correct output values are unknown.

In this article, we will begin exploring the core ideas behind reinforcement learning.

Starting Example

Let us begin by assuming that we are not hungry.

We will feed the value 0.0 into the neural network.

The neural network outputs a probability of 0.5 for going to Place B.

So:

  • Probability of going to Place B = p(B) = 0.5
  • Probability of going to Place A = 1 - p(B) = 0.5

Visualizing the Probabilities

We can represent these probabilities using a line.

First, we draw a line segment with length 0.5 to represent the probability of going to Place A.

Then, we append another line segment to represent the probability of going to Place B.

Together, these form a line ranging from 0 to 1.


Choosing an Action

To decide which place to go for a snack, we randomly pick a number between 0 and 1.

Let us pick 0.2.

Since 0.2 falls inside the region representing Place A, we choose to go to Place A.


Making a Guess About the Correct Action

Now, let us assume that going to Place A when hunger = 0 was the correct decision.

Ideally:

  • The probability of going to Place A, p(A), should be 1
  • The probability of going to Place B, p(B), should be 0

These ideal values are based on our guess about what the correct action should have been.


Moving Toward Optimization

Using these guessed ideal values, we can calculate the difference between:

  • the ideal probability for p(A)
  • the actual probability produced by the neural network

This allows us to calculate the derivative of the difference with respect to the bias we want to optimize.


In the next article, we will continue exploring how this optimization process works in reinforcement learning.

Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.

Just run:

ipm install repo-name
Enter fullscreen mode Exit fullscreen mode

… and you’re done! πŸš€

Installerpedia Screenshot

πŸ”— Explore Installerpedia here

Top comments (0)