NMT = Neural Machine Translation
Encoder-decoder model ⇒ conditional LM
- conditional: its predictions are also conditioned on the source x sentence
- LM: decoder is predicting the next word of sentence y.
Introdicuing attention a tecnhnique used in seq2seq which solve the bottleneck problem of compugin all the hidden states. The general idea is to create a score (dot product of encoder RNN and “start”), then make a probability distribution using softmax. By doing the weighted sum we can get an attention score. We use this score with the decoder hidden state to compute V~.