😎 Appunti di Dag7

Cerca

❯

❯

❯

Big Data Computing

❯

❯

16 - K-Means

13 lug 2025, 1 minuti

#course-lecture

16 - K-Means

K-means: Lloyd-Forgy Algorithm

Specify the number of output clusters K
Select K observations at random from the N data points as the initial cluster centroids
Assignment step: Assign each observation to the closest centroid based on the distance measure chosen
Update step: For each of the K clusters update the centroid by computing the new mean values of all the data points now in the cluster
Iteratively repeat steps 3-4 until a stopping criterion is met

K-means vs. K-means++

TBD

Evaluations

Clustering is evaluated based on the data that was clustered itself

A good clustering will produce high quality clusters with:
high intra-cluster similarity
low inter-cluster similarity
The measured quality of a clustering depends on
data representation
similarity measure

Each value of confusion matrix is given by rand.

Review…

K-means is an iterative (approximated) clustering method that converges to a local minimum
Tries to minimize the internal sum of squared Euclidean distances (Lloyd-Forgy Algorithm)
Many variants:
- K-means++, K-medoids (PAM Algorithm), BFR K-means, etc.
Internal vs. External measures of clustering quality

Vista grafico

16 - K-Means
K-means: Lloyd-Forgy Algorithm
K-means vs. K-means++
Evaluations
Review…

Link entranti

Big Data Computing

Creato con Quartz v4.2.2 © 2025

GitHub
Discord Community