As far as we advance in RNN layers and we do Backpropagation, gradient becomes weaker and weaker.

To solve this: LSTM! Long-Short Term Memory, which work as computer’s RAM.

For each step t, there is a hidden state and a cell state

  • both are of the same length

  • cell stores long-term information

  • LSTM can insert, delete or read information from cell state

  • The selection of keeping information or no is managed from three gates of n length, that can be 0, 1 or 0/1

    • Forget gate:
    • Input gate:
    • Output gate: