PPMI is a way to measure and valorize strong associations between words and contexts, to build word embeddings.
If PPMI = 0, it is impossible that two words go together.
Exercise
- Count words in a sentence and evaluates tot number of occurrences
| computer | data | pinch | result | sugar | Tot | |
|---|---|---|---|---|---|---|
| apricot | 0 | 0 | 1 | 0 | 1 | 2 |
| pineapple | 0 | 0 | 1 | 0 | 1 | 2 |
| digital | 2 | 1 | 0 | 1 | 0 | 4 |
| information | 1 | 6 | 0 | 4 | 0 | 11 |
| Tot | 3 | 7 | 2 | 5 | 2 |
- Make another table, and evaluate the probability for each cell, that is given by each cell divided by total number of cells-1 do not include total columns. Therefore 2/19, 1/19… as a real number
| computer | data | pinch | result | sugar | |
|---|---|---|---|---|---|
| apricot | 0/19 = 0.000 | 0/19 = 0.000 | 1/19 ≈ 0.0526 | 0/19 = 0.000 | 1/19 ≈ 0.0526 |
| pineapple | 0/19 = 0.000 | 0/19 = 0.000 | 1/19 ≈ 0.0526 | 0/19 = 0.000 | 1/19 ≈ 0.0526 |
| digital | 2/19 ≈ 0.1053 | 1/19 ≈ 0.0526 | 0/19 = 0.000 | 1/19 ≈ 0.0526 | 0/19 = 0.000 |
| information | 1/19 ≈ 0.0526 | 6/19 ≈ 0.3158 | 0/19 = 0.000 | 4/19 ≈ 0.2105 | 0/19 = 0.000 |
- for each cell, use ⇐ if positive and high, they are correlated and then ⇐ if negative, put 0, else is ok
The resulting table would be:
| computer | data | pinch | result | sugar | |
|---|---|---|---|---|---|
| apricot | 0 | 0 | 2.24 | 0 | 2.24 |
| pineapple | 0 | 0 | 2.24 | 0 | 2.24 |
| digital | 1.99 | 0.77 | 0 | 0.98 | 0 |
| information | 0.50 | 0.52 | 0 | 1.15 | 0 |