RMSProp

#deep-learning #interview

RMSProp = Root Mean Square Propagation
RMSProp updates the AdaGrad algorithm by using exponentially decaying average of the squares of the gradients rather than the sum
- This helps to reduce the monotonic decaying of learning Rate
- Converges fasters

\begin{aligned} g & = δ_{θ} L (θ) \\ G & = β \cdot G + (1 - β) \cdot g ⊙ g \\ θ & = θ - \frac{α}{\sqrt{G + ϵ}} ⊙ g \end{aligned}

Pros:

Work well will sparse data
Automatically adjusts learning rates based on parameter updates
Can converge faster than AdaGrad

Cons:

It can still converge too slowly
Requires tuning of the decay rate hyperparameter

References

Related Notes