RMSProp

g=δθL(θ)G=βG+(1β)ggθ=θαG+ϵg

Pros:

  1. Work well will sparse data
  2. Automatically adjusts learning rates based on parameter updates
  3. Can converge faster than AdaGrad

Cons:

  1. It can still converge too slowly
  2. Requires tuning of the decay rate hyperparameter

References


Related Notes