The Problem: Every RL Tutorial Starts at the Wrong End
Most reinforcement learning guides throw you into DQN or PPO code before you understand why those algorithms exist. You end up copy-pasting hyperparameters without knowing what $\gamma=0.99$ actually does to your agent's behavior.
Here's the thing: RL has a brutally logical progression from first principles. Markov Decision Process → Bellman Equation → Value Iteration → Q-Learning. Each step solves one specific limitation of the previous approach. Once you see that chain, the entire field clicks.
This post walks through that progression using 5 diagrams and minimal math. By the end, you'll know exactly why Q-learning exists and when it breaks.
Diagram 1: The Markov Decision Process (MDP)
An MDP is just a formal way to describe a decision-making problem. Five components:
- States $S$: where the agent can be (e.g., grid positions)
- Actions $A$: what the agent can do (e.g., move up/down/left/right)
Continue reading the full article on TildAlice

Top comments (0)