Q-Learning From Scratch: Reinforcement Learning in a Gridworld

#machinelearning #reinforcementlearning #ai #beginners

No labels, no "correct answer" — just rewards. Reinforcement learning lets an agent figure out the right moves by trial and error. Here's tabular Q-learning, the foundation DQN builds on, learning a gridworld live.

🎮 Watch the agent learn: https://dev48v.infy.uk/dl/day18-q-learning.html

The setup

An agent in a grid: reach the goal (+1), avoid the pits (−1), every step costs a little (−0.04 so it learns to be quick). Nobody tells it the right path — it tries actions, gets rewards, and learns.

The Q-table

Q(state, action) = the expected future reward of taking that action in that state. Start it at zero and update with the Bellman rule after each step:

Q(s,a) += α · ( r + γ · max Q(s',·) − Q(s,a) )

α (learning rate): how much each experience nudges the estimate
γ (discount): how much future reward counts vs immediate

Explore vs exploit

Early on the agent must explore (try random actions) to discover rewards; later it should exploit what it learned. ε-greedy does both: act randomly with probability ε, otherwise take the best-known action.

Over episodes the Q-values converge, the per-cell arrows snap toward the goal, and a greedy run walks the optimal path. Swap the table for a neural net and you get DQN — deep RL.

🔨 Built from scratch (env+rewards → Q-table → ε-greedy → Bellman update → episodes) on the page: https://dev48v.infy.uk/dl/day18-q-learning.html

Part of DeepLearningFromZero. 🌐 https://dev48v.infy.uk