Nesterov Accelerated Gradient (NAG)
- It is an update on the Stochastic Gradient Descent with Momentum.
- In the base one, the ball goes faster by time but it doesn't know where it is going
- In this algorithm, the algorithm computes the loss for the next gradient and goes to the direction where the loss will be less
- Only the term inside the loss function is changed compared to Stochastic Gradient Descent with Momentum
Pros:
- Does not miss the Local Minima
- Slows down if the minima is nearby (adaptive learning rate)
Cons:
- Hyperparameters need to be tuned