Nesterov Accelerated Gradient (NAG)

v=β.v+(1β).δθL(θβv)θ=θα.v

Pros:

  1. Does not miss the Local Minima
  2. Slows down if the minima is nearby (adaptive learning rate)

Cons:

  1. Hyperparameters need to be tuned

References


Related Notes