Debojyoti Chakraborty

Posted on Dec 31, 2020

Understanding of Reinforcement Learning first lecture stanford cs243 course

#machinelearning #ai #computerscience

1.intelligent agent->
2.learn to make good sequential decisions
3.optimality
4.utility

5.a agent need to intelligent to make good decisions

Atari learn the game from pixel to pixel

video game playing

robotics grasping clothes

educational games to amplify human intelligence

NLP,vision kind of optimization process

key aspects:

optimization: good decision or at least good strategy
- delayed consequences: no idea about decision is good for now or immediate but helpful past
- exploaration: agent explore everything try to leearn everything.... data is censored only a reward for decision made.
- policy is mapping pst experiences to the action not better if preprogram due to large high search space

good question why not pre-program a policy?

big search space

enourmous code base

atari learning from space of images what to do next
need some sort of generalisation.

AI planning : ogd why go game don't need exploration?
supervised: og already have experience as form of dataset
unsupervised: og no label but have data
RL:oged
imitation learning:ogd learning from others experience.

assumes that input coming from good policy demos.

reduces rl to supervised learning

explore the world use experience to guide decisions

end of class goal

sequential decision making under uncertainity

interactive close loop prcess agent take action max reward observation,reward -- > max future reward

expected stochastic process need strategic behaviour to get high reward

balancing immediate and long term reward

it may have to make long decision in which it get no rewards
for long times

if agent get easy option to choose for maximising the reward it can do that

reward function() is a important one.

sub desicipline of machine teaching

---------------------|------------

+ (0)+ - -

need constant two points

|||important things for sequential decision:
History,state space,world state descrete timer

small subset of the real world state

(Markov Assumption):

state current observation : s(t)

   t=inf

his(t)=sum(s(i))
i=0
^
history|

markov whole history can be markov

POMD

Bandits: actions have no influence on next observations

MDP and POMDPs actions influence future onservations

types of SDP:

Deterministic:

Stochastic:

RL algorithm :

model

policy : mapping function States->actionsstochas

stochastic policy Determinsitic policy

value fucntion gamma: expected discounted sum on future rewards
Reward: Mars Rover Stochastic Markov Model

RL agents:

Model based: have model

Model free: have policy and value function

Key challenges:

Planning,

finite horizon setting is to the time span of the system operation during which you are concerned about such defined performance measures. If you want to control the system, meeting the performance measures for a finite time say T, then the problem is finite horizon and if you are concerned about the optimality during the whole time span i.e till t=∞

, then it is an infinite horizon problem.

The problem of deriving control u(t)
, t=[0,T] for the system

x˙(t)=Ax(t)+Bu(t)

such that the performance index

PM=∫T0x(t)′Qx(t)+u′(t)Ru(t)dt

is minimised is a finite horizon problem

The problem of deriving control

, 
t=[0,∞] for the system
x˙(t)=Ax(t)+Bu(t)
such that the performance index
PM=∫∞0x(t)′Qx(t)+u′(t)Ru(t)dt

is minimised is an infinite horizon problem

Evaluation and control

Top comments (1)

Debojyoti Chakraborty • Jan 1 '21 • Edited

links goes here for understanding:

infinite horizon problem over optimal control: math.stackexchange.com/questions/2...

a simple example on it:math.stackexchange.com/questions/2...

DEV Community

Understanding of Reinforcement Learning first lecture stanford cs243 course

5.a agent need to intelligent to make good decisions

Top comments (1)

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Read next

8 Modern Developer Tools that Will 10X Your Productivity 🔥🚀

TransMonkey: A Versatile Alternative to DeepL?

Building Your First AI CLI Tool Using OpenAI’s API

AI Meets Supply Chains: Strategic Deployment and Supplier Innovation by Shubham R. Ekatpure