- RMSProp = Root Mean Square Propagation
- RMSProp updates the AdaGrad algorithm by using exponentially decaying average of the squares of the gradients rather than the sum
- This helps to reduce the monotonic decaying of learning Rate
- Converges fasters
Pros:
- Work well will sparse data
- Automatically adjusts learning rates based on parameter updates
- Can converge faster than AdaGrad
Cons:
- It can still converge too slowly
- Requires tuning of the decay rate hyperparameter
References