- Gradient Descent may have issues when the scale of the data is large
- If the number of training samples m is large
- Gradient Descent algorithm requires summing over all of m
- e.g. US population census population of 300MM
- Stochastic Gradient Descent is a modification of gradient descent
- In other words, the cost functions are different
- Stochastic every iteration is faster
- Steps: randomly shuffle dataset, optimize one training data at a time, improve parameters early one at a time, instead of looking at the examples together as a batch
- Weakness: generally moves towards global minimum, but doesn't always go there, can reach the general vicinity of the global minimum. Does not converge as nicely as gradient descent.
- In reality, practical data science, once it gets close to the global minimum its parameters are good enough. In real life, it works out.
Saturday, March 25, 2017
Difference between Batch Gradient Descent and Stochastic Gradient Descent - Udacity Machine Learning Nanodegree Coursera
Recommend this great 13 minutes crystal clear video by Andrew Ng on Coursera explaining the differences between batch gradient descent aka gradient descent aka normal gradient descent versus Stochastic Gradient Descent. https://www.coursera.org/learn/machine-learning/lecture/DoRHJ/stochastic-gradient-descent It's clear simple and easy to understand without prerequisite. Andrew Ng shows you how the formula differs, how the step by step train strategy differs and a visualization of the trajectory to find global minimum (the center of all the ellipse in his graph).
Explore <s, a> ---> s' reads: move from current state s to s' via action a. Through the action a reward is received, it ...
Google's algorithm has pushed websites to deploy mobile friendly websites, but sometimes business owners and developers really need to a...
The bogus request from P2PU to hunt for HTML tags in real life has yielded a lot of good thoughts. My first impression was that this is stup...
What is a domain name system (DNS)? How stuff works explains it in a very good graph I was very confused by the Wikipedia explanatio...