04 - Probability in ML


  • Description:: revising general probability concepts


  • we want to learn a function f: X Y
  • we have a dataset D = {(…)}
  • we have a hypotesis space H where we want to search the best possible value


  • we model probability of an action to determine what’s the best choice
  • this is done because we hate to take decisions that are not suitable for our cases
    • ex. how much time should we leave before departing from a place in the airport

Elements of probability

Probability space

  • is a sample point / possible word / outcome of a random process…

A probability space (or probability model) is a function such that

  • 0 P(w) 1
  • P() = 1

Example: rolling a die An event is any subset of Probability of an event A is a function assigning to A a value in [0,1]


  • A1 = “die roll < 4”: A1 =
    • P(A1) = P(1) + P(2) + P(3) = 1/6 + 1/6 + 1/6 = 1/2
    • (they are all indipendent events!. Obviously a single dice roll could result in just one of those probabilities, therefore is + not *)

Random variable

A random variable is a function that maps sample space to some range (the reals, or Booleans…)


X is a variable and a function!

X = the random variable X has the value X = is like saying

Example: Odd = true {1,3,5}

We can compute the probability of a random variable, by summing all the possible values belonging to the subset.

In the odd example is 1/2 because 1/6+1/6+1/6


We can use logical operations to create propositions, using ^, V, not

We assume any event in a proposition is positive, therefore true, unless we use the NOT operator.

We have three operators:

  • is *
  • V is +
  • is - (it means false)


Probability distribution is a function assigning to a probability value to all possible assignments of a random variable.


Sum of all values MUST BE 1

Join probability distribution: given two dices (fair), what’s the odd to get 3 on the first and 3 on the second? The first is 1/6, the second is 1/6, they are all separate events, so I just need to multiply each other probability, therefore the answer is 1/36

Also known as In ML I create a table rxc, row x columns, and write down all possible probabilities. The sum of each row and each column must be 1.

Conditional probability

A measure of the probability of an event happening, given that another event has already occurred

The formula is:

In the ML example

The meaning of the formula is “what’s the probability of A KNOWING that B has happened?”

Total probabilities

Chain rule

Example: Card Drawing Game

Suppose you have a standard deck of 52 playing cards. You draw three cards one by one without replacement. Let’s define the events:

  • Event A: The first card drawn is a heart.
  • Event B: The second card drawn is a face card (jack, queen, or king).
  • Event C: The third card drawn is red.

Using the chain rule, the probability of these three events happening in sequence is calculated as follows:

  1. Probability of Event A (Drawing a Heart): There are 13 hearts in a deck of 52 cards, so

  2. Probability of Event B Given A (Drawing a Face Card Given the First is a Heart): After drawing a heart, there are 12 face cards left in a deck of 51 cards, so

  3. Probability of Event C Given A and B (Drawing a Red Card Given the First Two Conditions are Met): After drawing a heart and a face card, there are 24 red cards left in a deck of 50 cards, so

Now, applying the chain rule:

So, the probability that the first card is a heart, the second card is a face card given that the first card is a heart, and the third card is red given the first two conditions is ( ) or approximately ( 0.0282 ) (or ( 2.82% ).

Inference by enumeration

Conditional indipendence

  • like the distributive law

Product rule and Bayes

from here we derive Bayes Rule

in distribution form

Useful for assessing diagnostic probability from causal probability.

This is a typical ML problem!