DEV Community

Cover image for Understanding Reinforcement Learning with Neural Networks Part 1: Learning Without Correct Answers
Rijul Rajesh
Rijul Rajesh

Posted on

Understanding Reinforcement Learning with Neural Networks Part 1: Learning Without Correct Answers

In this article, we will explore reinforcement learning with neural networks.

Let’s start with a simple example.

Choosing Between Two Snack Places

Suppose it is snack time, and you have to choose between Place A and Place B for fries.

To make a good decision, we also need to consider how hungry we are.

Some days we may be very hungry, while on other days we may only want a small snack.

We also need to consider how many fries each place might serve.

For example:

  • Place B might give a large quantity of fries, which would be great if we were very hungry
  • But if we were not that hungry, getting too many fries might not be ideal

Similarly:

  • Getting a small amount of fries would not be good if we were extremely hungry
  • But it could be perfectly fine if we only wanted a light snack

So, it would be useful to have a system that helps decide which place to choose based on:

  • our hunger level
  • the possible quantity of fries we might receive

Using a Neural Network

To solve this problem, we will use a neural network.

The neural network takes our hunger level as the input and outputs the probability of choosing Place B, written as p(B).

The Challenge

Normally, when training a neural network, we start with a training dataset that contains:

  • input values
  • correct output values

Using this data, we can train the network with standard backpropagation.

However, in this example, we do not know in advance whether Place A or Place B will serve a large or small quantity of fries.

Because of this, we do not know what the correct output values should be.

Reinforcement Learning

In situations where we do not have known output values, we can still train a model using reinforcement learning.

Instead of learning from correct answers, the model learns by trying actions and receiving feedback based on how good the outcome was.

In the next article, we will explore a reinforcement learning algorithm called policy gradients.


Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.

Just run:

ipm install repo-name
Enter fullscreen mode Exit fullscreen mode

… and you’re done! 🚀

Installerpedia Screenshot

🔗 Explore Installerpedia here

Top comments (0)