04  Probability in ML
Overview
 Description:: revising general probability concepts
Revising
 we want to learn a function f: X → Y
 we have a dataset D = {(…)}
 we have a hypotesis space H where we want to search the best possible value
Probability
 we model probability of an action to determine what’s the best choice
 this is done because we hate to take decisions that are not suitable for our cases
 ex. how much time should we leave before departing from a place in the airport
Elements of probability
Probability space
 $Ω:samplespace$
 $ω∈Ω$ is a sample point / possible word / outcome of a random process…
A probability space (or probability model) is a function $P:Ω→R$ such that
 0 ⇐ P(w) ⇐ 1
 $∑_{w∈Ω}$ P($ω$) = 1
Example: rolling a die $Ω=1,2,3,4,5,6$ $P(ω)={1/6,1/6,1/6,1/6,1/6,1/6}$ An event is any subset of $Ω$ Probability of an event A is a function assigning to A a value in [0,1]
examples:
 A1 = “die roll < 4”: A1 = ${1,2,3}⊂Ω$
 P(A1) = P(1) + P(2) + P(3) = 1/6 + 1/6 + 1/6 = 1/2
 (they are all indipendent events!. Obviously a single dice roll could result in just one of those probabilities, therefore is + not *)
Random variable
A random variable is a function that maps sample space $Ω$ to some range (the reals, or Booleans…) $X:Ω→B$
Example: $Odd:Ω→B$
X is a variable and a function!
X = $x_{i}$ the random variable X has the value $x_{i}∈B$ X = $x_{i}$ is like saying ${w∈B∣X(ω)=x_{i}}$
Example: Odd = true → {1,3,5}
We can compute the probability of a random variable, by summing all the possible values belonging to the subset.
In the odd example is 1/2 because 1/6+1/6+1/6
Propositions
We can use logical operations to create propositions, using ^, V, not
We assume any event in a proposition is positive, therefore true, unless we use the NOT operator.
We have three operators:
 $∧$ is *
 V is +
 $┐$ is  (it means false)
Distributions
Probability distribution is a function assigning to a probability value to all possible assignments of a random variable.
Important
Sum of all values MUST BE 1
Join probability distribution: given two dices (fair), what’s the odd to get 3 on the first and 3 on the second? The first is 1/6, the second is 1/6, they are all separate events, so I just need to multiply each other probability, therefore the answer is 1/36
Also known as $P(X=x,Y=y)$ In ML I create a table rxc, row x columns, and write down all possible probabilities. The sum of each row and each column must be 1.
Conditional probability
A measure of the probability of an event happening, given that another event has already occurred
The formula is:
$P(A∣B)=P(B)P(A∩B) $In the ML example $P(Cavity=true∣Weather=sunny)$
The meaning of the formula is “what’s the probability of A KNOWING that B has happened?”
Total probabilities
$E(X∣Y)=y_{i}∈D(Y)∑ P(X∣Y=y_{i})×P(Y=y_{i})$Chain rule
Example: Card Drawing Game
Suppose you have a standard deck of 52 playing cards. You draw three cards one by one without replacement. Let’s define the events:
 Event A: The first card drawn is a heart.
 Event B: The second card drawn is a face card (jack, queen, or king).
 Event C: The third card drawn is red.
Using the chain rule, the probability of these three events happening in sequence is calculated as follows:
$P(A∩B∩C)=P(A)×P(B∣A)×P(C∣A∩B)$

Probability of Event A (Drawing a Heart): There are 13 hearts in a deck of 52 cards, so $P(A)=5213 =41 $

Probability of Event B Given A (Drawing a Face Card Given the First is a Heart): After drawing a heart, there are 12 face cards left in a deck of 51 cards, so $P(B∣A)=5112 =174 $

Probability of Event C Given A and B (Drawing a Red Card Given the First Two Conditions are Met): After drawing a heart and a face card, there are 24 red cards left in a deck of 50 cards, so $P(C∣A∩B)=5024 =2512 $
Now, applying the chain rule:
$P(A∩B∩C)=41 ×174 ×2512 =170048 =42512 $
So, the probability that the first card is a heart, the second card is a face card given that the first card is a heart, and the third card is red given the first two conditions is ($42512 $ ) or approximately ( 0.0282 ) (or ( 2.82% ).
Inference by enumeration
Conditional indipendence
 like the distributive law $P(X∣Y,Z)=P(X∣Y,Z)P(Y∣Z)=P(X∣Z)P(Y∣Z)$
Product rule and Bayes
$P(a∩b)=P(a∣b)P(b)=P(b∣a)P(a)$
⇒ from here we derive Bayes Rule
$P(a∣b)=P(b)P(b∣a)P(a) $
in distribution form $P(Y∣X)=αP(X∣Y)P(Y)$
Useful for assessing diagnostic probability from causal probability.
$P(Cause∣Effect)=P(Effect)P(Effect∣Cause)P(Cause) $
This is a typical ML problem!