Mini Batch SGD
In Mini-batch SGD, we are taking benefits of both Gradient Descent and Stochastic Gradient Descent (SGD).
- We are showing the model a small portion (64, 128, 256, ....) of data at once
- This small portion is also known as Batch Size
Pros:
- Faster than standard Gradient Descent, especially for large dataset
- Can escape Local Minima easily
- Can reduce noise in updates, leading to stable convergence
Cons:
- Sensitive to the choice of mini-batch size