Add mini-batch gradient descent section (#32313)
Mini-batch is more popular than SGD, especially for extremely large data sets.
This commit is contained in:
@ -22,6 +22,9 @@ Machine learning problems usually requires computations over a sample size in th
|
|||||||
|
|
||||||
In stochastic gradient descent you update the parameter for the cost gradient of each example rather that the sum of the cost gradient of all the examples. You could arrive at a set of good parameters faster after only a few passes through the training examples, thus the learning is faster as well.
|
In stochastic gradient descent you update the parameter for the cost gradient of each example rather that the sum of the cost gradient of all the examples. You could arrive at a set of good parameters faster after only a few passes through the training examples, thus the learning is faster as well.
|
||||||
|
|
||||||
|
### Mini-batch Gradient Descent
|
||||||
|
Stochastic Gradient Descent computes the gradient on a single instance. As described above, this is indeed faster than evaluating the entire training set when computing a gradient. However, this is somewhat extreme and can lead to parameter volatility around the optimal solution. Mini-batch gradient descent offers a middle-ground where the gradients are computed on a random sample (a "mini" batch). The larger your mini-batch (the chunkier your selection from the training set), the less parameter volatility you can expect around the optimal solution.
|
||||||
|
|
||||||
### Further Reading
|
### Further Reading
|
||||||
|
|
||||||
* [A guide to Neural Networks and Deep Learning](http://neuralnetworksanddeeplearning.com/)
|
* [A guide to Neural Networks and Deep Learning](http://neuralnetworksanddeeplearning.com/)
|
||||||
|
Reference in New Issue
Block a user