NMT = Neural Machine Translation

Encoder-decoder model conditional LM

  • conditional: its predictions are also conditioned on the source x sentence
  • LM: decoder is predicting the next word of sentence y.

Introdicuing attention a tecnhnique used in seq2seq which solve the bottleneck problem of compugin all the hidden states. The general idea is to create a score (dot product of encoder RNN and “start”), then make a probability distribution using softmax. By doing the weighted sum we can get an attention score. We use this score with the decoder hidden state to compute V~.