As far as we advance in RNN layers and we do Backpropagation, gradient becomes weaker and weaker.
To solve this: LSTM! Long-Short Term Memory, which work as computer’s RAM.
For each step t, there is a hidden state and a cell state
- both are of the same length
- cell stores long-term information
- LSTM can insert, delete or read information from cell state
- The selection of keeping information or no is managed from three gates of n length, that can be 0, 1 or 0/1
- Forget gate:
- Input gate:
- Output gate: