PPMI is a way to measure and valorize strong associations between words and contexts, to build word embeddings.

If PPMI = 0, it is impossible that two words go together.

Exercise

  1. Count words in a sentence and evaluates tot number of occurrences
computerdatapinchresultsugarTot
apricot001012
pineapple001012
digital210104
information1604011
Tot37252
  1. Make another table, and evaluate the probability for each cell, that is given by each cell divided by total number of cells-1 do not include total columns. Therefore 2/19, 1/19… as a real number
computerdatapinchresultsugar
apricot0/19 = 0.0000/19 = 0.0001/19 ≈ 0.05260/19 = 0.0001/19 ≈ 0.0526
pineapple0/19 = 0.0000/19 = 0.0001/19 ≈ 0.05260/19 = 0.0001/19 ≈ 0.0526
digital2/19 ≈ 0.10531/19 ≈ 0.05260/19 = 0.0001/19 ≈ 0.05260/19 = 0.000
information1/19 ≈ 0.05266/19 ≈ 0.31580/19 = 0.0004/19 ≈ 0.21050/19 = 0.000
  1. for each cell, use if positive and high, they are correlated and then if negative, put 0, else is ok

The resulting table would be:

computerdatapinchresultsugar
apricot002.2402.24
pineapple002.2402.24
digital1.990.7700.980
information0.500.5201.150