😎 Appunti di Dag7

Cerca

❯

❯

❯

❯

❯

Vanishing Gradient Problem - LSTM

Vanishing Gradient Problem - LSTM

30 ott 2025, 1 minuti

#nlp
#machine-learning

As far as we advance in RNN layers and we do Backpropagation, gradient becomes weaker and weaker.

To solve this: LSTM! Long-Short Term Memory, which work as computer’s RAM.

For each step t, there is a hidden state $h^{t}$ and a cell state $c^{t}$

both are of the same $n$ length
cell stores long-term information
LSTM can insert, delete or read information from cell state
The selection of keeping information or no is managed from three gates of n length, that can be 0, 1 or 0/1
- Forget gate: $f^{t} = σ (W_{f} h^{t - 1} + U_{f} x^{t} + b_{f})$
- Input gate: $i^{t} = σ (W_{i} h^{t - 1} + U_{i} x^{t} + b_{i})$
- Output gate: $o^{t} = σ (W_{o} h^{t - 1} + U_{o} x^{t} + b_{o})$

$\tilde{c}^{t} = tanh (W_{c} h^{t - 1} + U_{c} x^{t} + b_{c})$

$c^{t} = f^{t} ⊙ c^{t - 1} + i^{t} ⊙ \tilde{c}^{t}$

$h^{t} = o^{t} ⊙ tanh (c^{t})$

Vista grafico

Link entranti

NLP Main

Creato con Quartz v4.2.2 © 2025

GitHub
Discord Community