Gradient Descent

Gradient Descent Flow

wnew=woldαlossw

Pros:

  1. Simple to implement
  2. Can work well with well tuned learning rate

Cons:

  1. Can be very slow especially for complex model or large dataset
  2. It requires large memory
  3. Computationally inefficient
  4. Sensitive to the choice of learning rate
  5. May trap to local minima like we saw before

Related Notes