- We take k samples
- maximize probability the real outside words appear
- minimize probability that random words appear around center word
If we do a matrix to count all occurrences on a corpus, it would take forever.
So, the idea is to save important information somewhere.
We can use GLOVE: we can save frequency of each couple of words in a corpus