DEV Community

Cover image for Spilling beans for how i learn for exam๐Ÿ˜"Reinforcement Learning Cheat Sheet"
Keerthana
Keerthana

Posted on

Spilling beans for how i learn for exam๐Ÿ˜"Reinforcement Learning Cheat Sheet"

Reinforcement Learning Cheat Sheet (Exam Killer Version)
*1. Core Idea (Write This in Any Answer Intro)
*

Reinforcement Learning is a learning paradigm where an agent interacts with an environment and learns to take actions that maximize cumulative reward over time.

Keywords to include:

Trial and error
Reward signal
Sequential decision making
2. RL Framework (Must Draw in Exam)

Agent โ†’ Action โ†’ Environment โ†’ Reward โ†’ New State

Write:

Agent (decision maker)
Environment (external system)
State (current situation)
Action (choice)
Reward (feedback)

๐Ÿ‘‰ Example (very important for marks):

Game playing / robot navigation
** 3. Markov Decision Process (MDP)**

Definition:
MDP is a mathematical model for RL problems.

Tuple:
(S, A, P, R, ฮณ)

S โ†’ States
A โ†’ Actions
P โ†’ Transition probability
R โ†’ Reward
ฮณ โ†’ Discount factor

๐Ÿ‘‰ Key concept:
Markov Property โ†’ Future depends only on present state

4. Return & Discount Factor

Return = total future reward

ฮณ (0 to 1)
High ฮณ โ†’ future matters
Low ฮณ โ†’ immediate reward matters
5. Value Functions (Very Important)
State Value: V(s) โ†’ how good a state is
Action Value: Q(s,a) โ†’ how good an action is

๐Ÿ‘‰ Always mention:
โ€œExpected cumulative rewardโ€

6. Bellman Equation (CORE CONCEPT)

๐Ÿ‘‰ Key idea:

Breaks problem into smaller subproblems
Recursive nature
7. Policy

Policy = strategy of agent

Deterministic โ†’ fixed action
Stochastic โ†’ probability-based
๐Ÿ‘‰ Write:
ฯ€(a|s)

8. Q-Learning (Most Important Algorithm)

Off-policy
Uses max future reward
9. SARSA

On-policy
Uses actual next action
10. Q-Learning vs SARSA (Exam Favorite)

11. Exploration vs Exploitation
Exploration โ†’ try new actions
Exploitation โ†’ use best known

๐Ÿ‘‰ Method:
Epsilon-greedy
12. Monte Carlo vs TD Learning

13. Policy Iteration vs Value Iteration
Policy Iteration:
Evaluate โ†’ Improve
Value Iteration:
Directly update values
14. Common Exam Mistakes (Avoid These)
Writing definitions without examples
Skipping diagrams
Not explaining formulas
No comparison tables
15. 1-Minute Revision Strategy

Before exam Revise:
Bellman Equation
Q-Learning & SARSA
MDP

๐Ÿ‘‰ These alone can cover most paper.
THIS IS THE PART1 IF YOU WANT PART2 OF CHEATSHEET JUST COMMENT BELOW OR VISIT, END OF THE SESSION

Top comments (0)