DEV Community

Gokul S
Gokul S

Posted on

Reinforcement Learning

Reinforcement learning is an important type of Machine Learning where an agent learn how to behave in a environment by performing actions and seeing the results. Reinforcement learning is about taking suitable action to maximize reward in a particular situation.

It is employed by various software and machines to find the best possible behavior or path it should take in a specific situation. Reinforcement learning differs from the supervised learning in a way that in supervised learning the training data has the answer key with it so the model is trained with the correct answer itself whereas in reinforcement learning, there is no answer but the reinforcement agent decides what to do to perform the given task. In the absence of training dataset, it is bound to learn from its experience.

Components:

Agent - the entity exhibiting certain behaviors (actions) based on its environment.

Actions - what the agent chooses to do at certain places in the environment. Actions can be discrete or continuous.

States - has to do with where in the environment the agent resides (at a specific location) or with what is going on in the environment (for a robotic vacuum, perhaps that its current location is also clean). By taking actions, the agent moves from one state to a new state. States can be partial or absolute.

Discount factor - determines the extent to which future rewards should contribute to the overall sum of expected rewards. At a factor of zero, this means agent would only care about the very next action and its reward. With a factor of one, it pays attention to future rewards as well.

Policy - this determines what action the agent takes given a particular state. Policies are split between stochastic and deterministic policies. Note that policy functions are often denoted by the symbol π.

i) Stochastic - determines a probability for choosing a given action in a particular state (e.g. an 80% chance to go straight, 20% chance to turn left)
ii) Deterministic - directly maps actions to states.

Value Function - Value functions indicate which actions you should take to maximize rewards over the long-term (the expected rewards when starting from some given state). These are often represented with the capital letter V.

Heroku

Simplify your DevOps and maximize your time.

Since 2007, Heroku has been the go-to platform for developers as it monitors uptime, performance, and infrastructure concerns, allowing you to focus on writing code.

Learn More

Top comments (0)

Billboard image

Create up to 10 Postgres Databases on Neon's free plan.

If you're starting a new project, Neon has got your databases covered. No credit cards. No trials. No getting in your way.

Try Neon for Free →

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay