Silicon Vanity | Tech lifestyle in Silicon Valley: Difference between Batch Gradient Descent and Stochastic Gradient Descent - Udacity Machine Learning Nanodegree Coursera

Saturday, March 25, 2017

Difference between Batch Gradient Descent and Stochastic Gradient Descent - Udacity Machine Learning Nanodegree Coursera

Recommend this great 13 minutes crystal clear video by Andrew Ng on Coursera explaining the differences between batch gradient descent aka gradient descent aka normal gradient descent versus Stochastic Gradient Descent. https://www.coursera.org/learn/machine-learning/lecture/DoRHJ/stochastic-gradient-descent It's clear simple and easy to understand without prerequisite. Andrew Ng shows you how the formula differs, how the step by step train strategy differs and a visualization of the trajectory to find global minimum (the center of all the ellipse in his graph).

Summary

Gradient Descent may have issues when the scale of the data is large

If the number of training samples m is large
Gradient Descent algorithm requires summing over all of m

e.g. US population census population of 300MM

Stochastic Gradient Descent is a modification of gradient descent

In other words, the cost functions are different

Stochastic every iteration is faster
Steps: randomly shuffle dataset, optimize one training data at a time, improve parameters early one at a time, instead of looking at the examples together as a batch
Weakness: generally moves towards global minimum, but doesn't always go there, can reach the general vicinity of the global minimum. Does not converge as nicely as gradient descent.
In reality, practical data science, once it gets close to the global minimum its parameters are good enough. In real life, it works out.

Silicon Vanity | Tech lifestyle in Silicon Valley

Ad

Saturday, March 25, 2017

Difference between Batch Gradient Descent and Stochastic Gradient Descent - Udacity Machine Learning Nanodegree Coursera

No comments:

Post a Comment

React UI, UI UX, Reactstrap React Bootstrap