Provides ISolver implementations based on [Stochastic Gradient Descent]. : https://en.wikipedia.org/wiki/Stochastic_gradient_descent
One of the steps during backpropagation is determining the gradient for each weight. In theory this can be achieved very well using Gradient Descent (GD). In practice however applying GD when minimizing an objective function for large datasets, quickly becomes computationaly unfeasible.
Luckily GD can be approximated by taking a small random subset from the training dataset, commonly refered to as mini-batch. We then compute the gradients only for the samples from the mini-batch and average over these gradients, resulting in an estimate of the global gradient. This method is refered to as Stochastic Gradient Descent.
pub use self::momentum::Momentum;
A [Stochastic Gradient Descent with Momentum] : https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Momentum