Add mini-batch gradient descent section (#32313)

Mini-batch is more popular than SGD, especially for extremely large data sets.
This commit is contained in:
Jason Yum
2019-07-07 19:15:19 -04:00
committed by Tom
parent 26671da96c
commit 95f589a9f6

View File

@ -22,6 +22,9 @@ Machine learning problems usually requires computations over a sample size in th
In stochastic gradient descent you update the parameter for the cost gradient of each example rather that the sum of the cost gradient of all the examples. You could arrive at a set of good parameters faster after only a few passes through the training examples, thus the learning is faster as well.
### Mini-batch Gradient Descent
Stochastic Gradient Descent computes the gradient on a single instance. As described above, this is indeed faster than evaluating the entire training set when computing a gradient. However, this is somewhat extreme and can lead to parameter volatility around the optimal solution. Mini-batch gradient descent offers a middle-ground where the gradients are computed on a random sample (a "mini" batch). The larger your mini-batch (the chunkier your selection from the training set), the less parameter volatility you can expect around the optimal solution.
### Further Reading
* [A guide to Neural Networks and Deep Learning](http://neuralnetworksanddeeplearning.com/)