Exploding Gradient

Why Exploding Gradient Occurs?

If the gradient is greater than 1.0 and the network is too deep, then the gradient accumulates to a very large number

How to identify Exploding Gradient?

  • The model weights quickly become very large during training
  • Model weights go to NaN
  • The error gradient is always above 1.0 for each node and layer during training

How to solve Exploding Gradient?

Related Notes