15 - Clustering

Review

Clustering is an unsupervised learning technique to group “similar” data objects together
- Depends on:
  - object representation
  - similarity measure
Harder when data dimensionality gets large (curse of dimensionality)
Number of output clusters is part of the problem itself!

There are several ways to divide Cluster Algorithms, one of them is the Partitioning-based.

It is like doing binary strings: we have N elements and K ways to divide them, and a 0 or 1 that states if an element belongs or not to the string.

For example, if K=2 → 0000001, 101001011 or 1010100101 whatever as long as EVERY element appears AT LEAST once.
- technically if we have K=2, we have $2^{k}$ assignment. More in general $k^{n}$ , however, since we are interested to the fact which every element must appear AT LEAST once, this number is slightly minor
- cases like 000000000 are excluded

K-way non-empty partitions of N elements→ $S (K, N) \to K^{n} / K! = O (K^{N})$

Objective function
- exact solution must explore exponential search space → NP-Hard
- non-convex due to the discrete assignment matrix A → multiple local minima
LLoyd-Forgy Function
- NP-hardness doesn’t allow us to compute the exact solution (i.e., global optimum)
- Non-convexity doesn’t allow us to rely on nice property of convex optimization (unique global optimum)
- A convex objective can be (approximately) solved with numerical methods to find the global optimum

It is a 2-step iterative algorithm:

It Does not guarantee to find the global optimum as it may stuck to a local optimum or a saddle point