Saturday, March 2, 2019

Udacity Machine Learning Nanodegree Mega Review Part 8

Markov Decision Process

See more in the series visit the main course outline page

Lesson 2 Markov Decision Process
.5 Markov Decision Process - 1: single agent, there are STATES s - a set of tokens that represent every state one can be in - which part of the grid I am in - the entire grid minus blocked states, (x,y) coordinates, process for making decisions, MODEL T(s,a,s')~Pr(s'|s,a)

.6 Markov Decision Process - 2:

Action things you can do in a particular STATE: UP DOWN LEFT RIGHT
Action is also a function of state A(s), or a set of actions - A

Model aka the transition model describes the rule of the world. How to play the game.

The transition Model is a function of two variables state, action, next state aka state_prime.

S' can equal to S : means to stay.

The transition model outputs the probability one would end up at S' given that person is transitioning from S with action a

Deterministic case: means there is no noise. Take every action with certainty: 100%. In nondeterministic, action execute faithfully 80% of time, 0.8, 0.1, 0.1,

Model describes the rule of the game. Also captures what happens if you do something. Physics of the world.
Transition models are probablistic by nature
.7 Markov Decision Process - 3: Markovian property, Markov means you don't have to condition on everything pass the most recent state - Markov only the present matters. Only depends on current state s. Pr(s'|s,a) there's only one dependency on s not s1 s2 s3.

You can turn anything into markovian process by making sure the current state remembers anything from the past.

Second property of MDP: nothing ever changes, things are stationery, these rules don't change over time.

Reward : R(s) for being in a state, R(s,a) reward for being in a state and take an action,  R(s,a,s') being in a state take an action and end up in s'. All mathematically equivalent. Intuition:
Green or goal is great. Want to be there. Red is punishment, restricted area. Encompasses the domain knowledge. Usefulness of entering that state.

.8 Markov Decision Process - 4: MDP describes a problem, the solution is described in Pi or policy. Pi(s) --> a takes in a state, and outputs the action to take. It's a solution to the MDP.

Pi* or policy star is the optimal policy that maximizes your long term reward across time.

No comments:

Post a Comment

Developing apps for airtable using Airtable Blocks

The airtable smart sheets now has an app platform called Airtable Blocks, which allows developers to add custom code, and build apps quickly...