Cluster analysis

Grouping objects so that objects in the same group are more similar to each other than to objects of other groups.

Cluster analysis

Grouping objects so that objects in the same group are more similar to each other than to objects of other groups.

Agglomerative Hierarchical Clustering

Clusters: 2
Input: 2, 8, 4, 11, 3

Agglomerative Hierarchical Clustering

Clusters: 2
Input: 2, 8, 4, 11, 3

Agglomerative Hierarchical Clustering

Clusters: 2
Input: 2, 8, 4, 11, 3

Linkage:

single([2,3], [4]) = 1
complete([2,3], [4]) = 2
centroid([2,3], [4]) = 1.5

Agglomerative Hierarchical Clustering

Clusters: 2
Input: 2, 8, 4, 11, 3

Linkage:

single([2,3], [4]) = 1
complete([2,3], [4]) = 2
centroid([2,3], [4]) = 1.5

Agglomerative Hierarchical Clustering

Clusters: 2
Input: 2, 8, 4, 11, 3

Linkage:

single([2,3,4], [8]) = 4, single([8], [11]) = 3
complete([2,3,4], [8]) = 6, complete([8], [11]) = 3
centroid([2,3,4], [8]) = 5, centroid([8], [11]) = 3

Agglomerative Hierarchical Clustering

Clusters: 2
Input: 2, 8, 4, 11, 3

Linkage:

single([2,3,4], [8]) = 4, single([8], [11]) = 3
complete([2,3,4], [8]) = 6, complete([8], [11]) = 3
centroid([2,3,4], [8]) = 5, centroid([8], [11]) = 3

K-means

Clusters: 2
Input: 2, 8, 4, 11, 2

K-means

Clusters: 2
Input: 2, 8, 4, 11, 2

Initial centroids: A=2, B=4

K-means

Clusters: 2
Input: 2, 8, 4, 11, 2

Initial centroids: A=2, B=4

K-means

Clusters: 2
Input: 2, 8, 4, 11, 2

New centroids: A=2.5, B=7.5

K-means

Clusters: 2
Input: 2, 8, 4, 11, 2

New centroids: A=2.5, B=7.5

K-means

Clusters: 2
Input: 2, 8, 4, 11, 2

K-means

Clusters: 2
Input: 2, 8, 4, 11, 2

Centroids/membership is stable

Clustering in two dimensions

Euclidian distance

\(d_\epsilon(X,Y) = \sqrt{\big|X_0 - Y_0\big|^2 + \big|X_1 - Y_1\big|^2 + ... + \big|X_n - Y_n\big|^2}\)

p-norm

\(d_p(X, Y) = \bigg(\big|X_0 - Y_0\big|^2 + \big|X_1 - Y_1\big|^2 + ... + \big|X_n - Y_n\big|^2\bigg)^{1/p}\)

  • Manhattan (\(p=1\))
  • Euclidian (\(p=2\)) -Chebyshev (\(p=∞\))

Other: Levenstein-Damerau (strings), graph distances