Ad

Thursday, July 19, 2018

Reinforcement Learning Q Learning

Explore <s, a> ---> s' reads: move from current state s to s' via action a.  Through the action a reward is received, it can be positive for positive reinforcement, negative for punishment or discouragement. As the robot explores the environment, the agent will update the Q table which tracks the scores of accumulated scores.

Bellman Equation is one of the utility equations used to track scores.
U(s) = R(s) + ɣ max_a Σ (s,a,s') U(s')
The function none linear. This fancy function means current utility is a function of reward, a multiplier or a fraction of the max total future actions and future rewards.

Start with arbitrary utility, explore, and update based on allowed neighboring moves, based on the states it can reach. Update at every iteration.

No comments:

Post a Comment

Udacity Deep Learning Nanodegree Course Summary - Convolutional Neural Network

10000 seem hard to train hard to spot cancer sometimes there are too many miles 3.1.15 Validation loss: first choose a percentage to make ...