Word2Vec is a framework for learning word vectors, an alternative representation.
- Corpus, a long list of words
- Every word in a fixed vocabulary is represented by a vector
- Go through each position in in the text which has a center word and outside context words
- Use the similarity of the word vectors for and to calculate and
- Keep adjusting the word vectors to maximize probability
With a slidable window = 2:
problems turning **into** banking crises
outside center outside
f(w_t-2|wt) f(w_t-1|wt) c f(w_t+1|wt) f(w_t+2|wt)
So:
This is the product of each conditional probability for all words close to the center.
This is complessive, therefore works for each word related to center.
But we also need to calculate the loss function!
Minimize = loss function = Maximum accuracy!
To get a single couple of word, C as central word and O as context word:
This is the basic “unit” of predicting things, unlike that works for all the dataset.
This is also known as softmax.
We now need to calculate gradient: how much vectors need to be in order to reduce errors.