Stochastic Gradient Descent (SGD)

Pros:

  1. Frequent update of model parameters
  2. Need very less memory as it only looks at one example at a time
  3. Can handle large data set

Cons:

  1. The frequent update gives noisy gradient, so the convergence can be slow and in the worst case trapped in Local Minima
  2. High variance
  3. Frequent updates are computationally expensive
  4. May overshoot even after getting global minima (for outliers)

Related Notes