- RMSProp = Root Mean Square Propagation
 
- RMSProp updates the AdaGrad algorithm by using exponentially decaying average of the squares of the gradients rather than the sum
- This helps to reduce the monotonic decaying of learning Rate
 
- Converges fasters
 
 
Pros:
- Work well will sparse data
 
- Automatically adjusts learning rates based on parameter updates
 
- Can converge faster than AdaGrad
 
Cons:
- It can still converge too slowly
 
- Requires tuning of the decay rate hyperparameter
 
References