Mini Batch SGD

In Mini-batch SGD, we are taking benefits of both Gradient Descent and Stochastic Gradient Descent (SGD).

Pros:

  1. Faster than standard Gradient Descent, especially for large dataset
  2. Can escape Local Minima easily
  3. Can reduce noise in updates, leading to stable convergence

Cons:

  1. Sensitive to the choice of mini-batch size

Related Notes