Nesterov Accelerated Gradient (NAG)

v=β.v+(1−β).δθL(θ−β⋅v)θ=θ−α.v

Pros:

  1. Does not miss the Local Minima
  2. Slows down if the minima is nearby (adaptive learning rate)

Cons:

  1. Hyperparameters need to be tuned

References


Related Notes