### Markov Decision Process

See more in the series visit the main course outline page

Important

Lesson 2 Markov Decision Process

.5 Markov Decision Process - 1: single agent, there are STATES s - a set of tokens that represent every state one can be in - which part of the grid I am in - the entire grid minus blocked states, (x,y) coordinates, process for making decisions, MODEL T(s,a,s')~Pr(s'|s,a)

.6 Markov Decision Process - 2:

Action things you can do in a particular STATE: UP DOWN LEFT RIGHT

Action is also a function of state A(s), or a set of actions - A

Model aka the transition model describes the rule of the world. How to play the game.

The transition Model is a function of two variables state, action, next state aka state_prime.

S' can equal to S : means to stay.

The transition model outputs the probability one would end up at S' given that person is transitioning from S with action a

Deterministic case: means there is no noise. Take every action with certainty: 100%. In nondeterministic, action execute faithfully 80% of time, 0.8, 0.1, 0.1,

Model describes the rule of the game. Also captures what happens if you do something. Physics of the world.

Pr(S'|S,A)

Transition models are probablistic by nature

.7 Markov Decision Process - 3: Markovian property, Markov means you don't have to condition on everything pass the most recent state - Markov only the present matters. Only depends on current state s. Pr(s'|s,a) there's only one dependency on s not s1 s2 s3.

You can turn anything into markovian process by making sure the current state remembers anything from the past.

Second property of MDP: nothing ever changes, things are stationery, these rules don't change over time.

Reward : R(s) for being in a state, R(s,a) reward for being in a state and take an action, R(s,a,s') being in a state take an action and end up in s'. All mathematically equivalent. Intuition:

Green or goal is great. Want to be there. Red is punishment, restricted area. Encompasses the domain knowledge. Usefulness of entering that state.

.8 Markov Decision Process - 4: MDP describes a problem, the solution is described in Pi or policy. Pi(s) --> a takes in a state, and outputs the action to take. It's a solution to the MDP.

Pi* or policy star is the optimal policy that maximizes your long term reward across time.

## No comments:

## Post a Comment