DEV Community

TildAlice
TildAlice

Posted on • Originally published at tildalice.io

RL Fundamentals: MDP, Bellman Equation, and Value Functions

This is the first article in a 5-part Reinforcement Learning series. By the end of this series, you'll understand and implement algorithms from basic Q-Learning to PPO and SAC.

Series Overview:

  • Part 1: RL Basics (You are here)
  • Part 2: From Q-Learning to DQN
  • Part 3: Policy Gradient Methods
  • Part 4: PPO — The Industry Standard
  • Part 5: SAC — Mastering Continuous Control

What is Reinforcement Learning?

Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by interacting with an environment. Unlike supervised learning where you have labeled data, RL agents learn from trial and error — they take actions, observe results, and adjust their behavior to maximize cumulative reward.

Think of it like training a dog. You don't show the dog 10,000 labeled images of "sit." Instead, the dog tries different things, and you reward the behavior you want. Over time, the dog learns which actions lead to treats.

Agent → Action → Environment → (Next State, Reward) → Agent learns
Enter fullscreen mode Exit fullscreen mode

The MDP Framework

Almost every RL problem is modeled as a Markov Decision Process (MDP). An MDP has five components:

Symbol Name Description
S States All possible situations the agent can be in
A Actions All possible moves the agent can take
P(s' s,a) Transition

Continue reading the full article on TildAlice

Top comments (0)