LSTM

LSTM = Long Short Term Memory
LSTM unit shares weight
- That is the main reason that LSTM can work with the variable input and output
It can handle longer sequence that RNN
LSTM uses both Sigmoid Function and Tanh Activation Function
The problem is that whatever the input size the context vector to decoder is fixed and so information get lost,
- Solution: Transformer

LSTM Unit Steps

Related Notes