LSTM
- LSTM = Long Short Term Memory
- LSTM unit shares weight
- That is the main reason that LSTM can work with the variable input and output
- It can handle longer sequence that RNN
- LSTM uses both Sigmoid Function and Tanh Activation Function
- The problem is that whatever the input size the context vector to decoder is fixed and so information get lost,
- Solution: Transformer
LSTM Unit Steps
- Forget Gate: What percent of previous Long Term Memory to remember
- Input Gate:
- Calculate Potential Long Term Memory for this unit
- What percent of current Long Term Memory to remember
- Output Gate:
- Calculate current Short Term Memory
- What percent of current Short Term Memory to remember
- Output = current short term + newly formed long term