RL Basics: MDP to Q-Learning in 5 Diagrams

#reinforcementlearnin #qlearning #mdp #bellmanequation

The Problem: Every RL Tutorial Starts at the Wrong End

Most reinforcement learning guides throw you into DQN or PPO code before you understand why those algorithms exist. You end up copy-pasting hyperparameters without knowing what $\gamma=0.99$ actually does to your agent's behavior.

Here's the thing: RL has a brutally logical progression from first principles. Markov Decision Process → Bellman Equation → Value Iteration → Q-Learning. Each step solves one specific limitation of the previous approach. Once you see that chain, the entire field clicks.

This post walks through that progression using 5 diagrams and minimal math. By the end, you'll know exactly why Q-learning exists and when it breaks.