1. We take k samples
  2. maximize probability the real outside words appear
  3. minimize probability that random words appear around center word

If we do a matrix to count all occurrences on a corpus, it would take forever.

So, the idea is to save important information somewhere.

We can use GLOVE: we can save frequency of each couple of words in a corpus