Vanishing Gradient
- Vanishing gradient means the gradient becomes so less that to a computer it is 0
- Why it arises?
- Because of memory precision
- Because of multiplication of all layers (too deep)
- How to identify?
- Parameters of the top layers are changing, whereas on bottom layers no change
- Model learns on a very slow pace
- Training could stop learning at a very early phase after a few iterations
- What to do?
- What to do depends on the architecture and the reason of vanishing gradient
- The few common ways are
- LSTM
- ReLU - which introduces Exploding Gradient
- Batch Normalization
- Weight Initialization
- Skip Connection
- GRU
- Reduce Network depth
TODO:
- Read how the ways do solve
- Create flash card