NLP Glossary

Here are all the words used in the notes, and their meaning according to slides.

Corpora: courpus, a computer readable collection
Utterance: uhm, ehm…
Word Types: set of distinct words in a corpus $V$ , where $∣ V ∣ = # t y p es$ (e.g. “I like fish and I like chips”: word types = 5, because we must not count the duplicates)
Tokens: number of word in a phrase (e.g. “I like fish and I like chips”: tokens = 7)
Lemmatization: extract root from a word
Minimization: see Loss Function
Loss Function: how good a model is. The lower the better, it means that is very accurate in predicting
Softmax: soft probability smaller than x, max amplifies probabilities of target $x_{i}$ . Maps arbitrary values $x_{i}$ to probability distribution $p_{i}$
- $so f t ma x (x_{i}) = \frac{e x p ( x _{i} )}{\sum _{j = 1}^{n} e x p ( x _{j} )} = p_{i}$
Gradient: how much vectors need to be in order to reduce errors

😎 Appunti di Dag7