Word2Vec

Word2Vec is a framework for learning word vectors, an alternative representation.

Corpus, a long list of words
Every word in a fixed vocabulary is represented by a vector
Go through each position in $t$ in the text which has a center word $c$ and outside context words
Use the similarity of the word vectors for $c$ and $o$ to calculate $P (o ∣ c)$ and $P (c ∣ o)$
Keep adjusting the word vectors to maximize probability

With a slidable window = 2:

  problems     turning   **into**    banking      crises
        outside           center           outside         
f(w_t-2|wt) f(w_t-1|wt)      c      f(w_t+1|wt)   f(w_t+2|wt)

So:

L (ϕ) = t = 1 \prod T - c \leq j \leq c j \neq = 0 \prod P (w_{t + j} ∣ w_{t}; ϕ)

This is the product of each conditional probability for all words close to the center.

This is complessive, therefore works for each word related to center.

But we also need to calculate the loss function!

J (θ) = - t = 1 \sum T - c \leq j \leq c j \neq = 0 \sum lo g P (w_{t + j} ∣ w_{t}; θ)

Minimize = loss function = Maximum accuracy!

To get a single couple of word, C as central word and O as context word:

P (O ∣ C) = \frac{exp ( U _{o}^{⊤} V _{c} )}{\sum _{w \in V} exp ( U _{w}^{⊤} V _{c} )}

This is the basic “unit” of predicting things, unlike $L (ϕ)$ that works for all the dataset.

This is also known as softmax.

We now need to calculate gradient: how much vectors need to be in order to reduce errors.

😎 Appunti di Dag7

Esplora

Word2Vec

Vista grafico

Link entranti