From Q-Learning to DQN: Your First RL Algorithms

#reinforcementlearnin #dqn #qlearning #pytorch

This is Part 2 of our 5-part Reinforcement Learning series. We're moving from theory to our first real algorithms — Q-Learning and Deep Q-Networks.

Series Overview:

Part 1: RL Basics — MDP, Bellman Equation, Value Functions
Part 2: Q-Learning to DQN (You are here)
Part 3: Policy Gradient Methods
Part 4: PPO — The Industry Standard
Part 5: SAC — Mastering Continuous Control

From Values to Actions: Q-Learning

In Part 1, we used value iteration — but that requires knowing the environment's transition probabilities P(s'|s,a). In most real problems, we don't have that. We need to learn from experience.

Q-Learning solves this by directly learning the action-value function Q(s,a) through interaction:

Q(s, a) ← Q(s, a) + α [ r + γ · max_a' Q(s', a') - Q(s, a) ]

Breaking this down:

α (learning rate): How much to update per step (0.01–0.1)
r + γ · max Q(s', a'): The TD target — what we think Q should be
Q(s,a) - target: The TD error — how wrong we were

The beauty of Q-Learning is that it's off-policy — it learns the optimal policy regardless of what exploration strategy you use.

Tabular Q-Learning Implementation

Let's implement Q-Learning on our GridWorld from Part 1:


python
import numpy as np
import random

class GridWorld:
    def __init__(self, size=5):
        self.size = size
        self.goal = (size-1, size-1)
        self.traps = [(1, 1), (2, 3)]

---

*Continue reading the full article on [TildAlice](https://tildalice.io/q-learning-to-dqn-deep-reinforcement-learning/)*

DEV Community

From Q-Learning to DQN: Your First RL Algorithms

From Values to Actions: Q-Learning

Tabular Q-Learning Implementation

Top comments (0)