This is the first article in a 5-part Reinforcement Learning series. By the end of this series, you'll understand and implement algorithms from basic Q-Learning to PPO and SAC.
Series Overview:
- Part 1: RL Basics (You are here)
- Part 2: From Q-Learning to DQN
- Part 3: Policy Gradient Methods
- Part 4: PPO — The Industry Standard
- Part 5: SAC — Mastering Continuous Control
What is Reinforcement Learning?
Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by interacting with an environment. Unlike supervised learning where you have labeled data, RL agents learn from trial and error — they take actions, observe results, and adjust their behavior to maximize cumulative reward.
Think of it like training a dog. You don't show the dog 10,000 labeled images of "sit." Instead, the dog tries different things, and you reward the behavior you want. Over time, the dog learns which actions lead to treats.
Agent → Action → Environment → (Next State, Reward) → Agent learns
The MDP Framework
Almost every RL problem is modeled as a Markov Decision Process (MDP). An MDP has five components:
| Symbol | Name | Description |
|---|---|---|
| S | States | All possible situations the agent can be in |
| A | Actions | All possible moves the agent can take |
| P(s' | s,a) | Transition |
Continue reading the full article on TildAlice
Top comments (0)