DEV Community

TildAlice
TildAlice

Posted on • Originally published at tildalice.io

RL Basics: MDP to Q-Learning in 5 Diagrams

The Problem: Every RL Tutorial Starts at the Wrong End

Most reinforcement learning guides throw you into DQN or PPO code before you understand why those algorithms exist. You end up copy-pasting hyperparameters without knowing what $\gamma=0.99$ actually does to your agent's behavior.

Here's the thing: RL has a brutally logical progression from first principles. Markov Decision Process → Bellman Equation → Value Iteration → Q-Learning. Each step solves one specific limitation of the previous approach. Once you see that chain, the entire field clicks.

This post walks through that progression using 5 diagrams and minimal math. By the end, you'll know exactly why Q-learning exists and when it breaks.

Wooden letter tiles arranged to spell 'learn' on a background of scattered tiles.

Photo by Pixabay on Pexels

Diagram 1: The Markov Decision Process (MDP)

An MDP is just a formal way to describe a decision-making problem. Five components:

  • States $S$: where the agent can be (e.g., grid positions)
  • Actions $A$: what the agent can do (e.g., move up/down/left/right)

Continue reading the full article on TildAlice

Top comments (0)