Before machines could learn through experience, three powerful ideas from psychology, control theory, and learning algorithms laid the foundation. This post unpacks the origins of reinforcement learning — not in code, but in concept.
Brief historical background of reinforcement learning:
Let’s explore the 3 main threads that led to reinforcement learning
These include : -
• Trial-and-error Learning
• Optimal Control
• Temporal-Difference Learning
Trial-and-error Learning : This thread comes from pscychology, where scientists studied how animals learn by trying different things and seeing what works best. Edwin Thorndike(Thorndike, 1911, p.244) came up with the “Law of Effect” which say that actions with good outcomes are more likely to be repeated
.Optimal-Control : This is a term used to describe the problem of designing a controller to minimize a measure(which could be something like cost function) of a dynamic system’s behavior over time. One approach to this problem was developed in mid-1950 by the Mathematician Richard Bellman called dynamic programing which is a way to solve complex problems by breaking it down into smaller parts
Temporal-Difference Learning : this is in part with animal learning psychologies in particular in the notion of secondary reinforcers . reinforcers are those things that follow a behavior (action) and make it more likely to occur.
- Primary Reinforcers : These are things that are naturally rewarding or pleasing e.g food, water, etc
- Secondary Reinforcers : These are things that become rewarding because they are associated with things that are naturally rewarding e.g Money will motivate you because you can use it to get food, comfort etc
Top comments (0)