Spectral clustering

Spectral clustering [9] is a recently proposed successful method rooted in graph theory [11], capable of clustering data based on point-to-point similarities. For most problems, ``similarity'' is measured as the distance between data points, defining clusters as connected zones of points and making it possible to easily cluster non-convex data sets, as illustrated by Fig. 2.

Figure 2: (a) Two sets of intertwined data points, difficult or impossible to cluster with conventional clustering algorithms. (b) Spectral clustering easily divides the points in two separate groups, based on the principle that two points should be in the same group if they are close to each other.
(a) (b)

The similarity between two points $ \textbf{x}[i]$ and $ \textbf{x}[j]$ is measured through a kernel function $ \kappa(.)$ such as

$\displaystyle \kappa(\textbf{x}[i],\textbf{x}[j]) =
 \exp\left(-\frac{d^2(\textbf{x}[i],\textbf{x}[j])}{\sigma^2}\right)$ (3)

where $ d(\textbf{x}[i],\textbf{x}[j])$ is some distance measure between points $ \textbf{x}[i]$ and $ \textbf{x}[j]$ and $ \sigma$ is the kernel size. This kernel function is almost $ 1$ for points that are close to each other, and lowers as the distance rises.

Given a set of $ N$ points $ \{\textbf{x}[1], \textbf{x}[2], \dots,
\textbf{x}[N]\}$ , a similarity matrix (also called ``affinity'' or kernel matrix) can be defined as $ A_{ij} =
\kappa(\textbf{x}[i],\textbf{x}[j])$ . Clustering is performed by analyzing the spectrum of that matrix. One of the most successful spectral clustering algorithms is the Ng-Jordan-Weiss (NJW) algorithm, introduced in [9]. It can be summarized as follows:

  1. Calculate the affinity matrix A using (3), and set $ A_{ii} = 0$ for $ i=1,\dots,N$ .
  2. Obtain $ L=D^{-1/2} A D^{-1/2}$ , where $ D$ is a diagonal matrix with $ D_{ii} = \sum_{j=1}^{N}A_{ij}$ . This normalization will assure that all clusters have more or less equal size.
  3. Form the matrix $ V = [v_1,v_2,\dots,v_k]$ where $ v_1,v_2,\dots,v_k$ are the $ k$ largest eigenvectors of $ L$ and $ k$ is the number of subsets to retrieve.
  4. Treat the rows of $ V$ as points in $ \mathbb{R}^k$ , and normalize them to unit length. These points correspond to the original points $ \textbf{x}[i]$ but form compact clusters now. Cluster them with an algorithm such as k-means.
  5. Assign the original point $ \textbf{x}[i]$ to cluster $ j$ if and only if row $ i$ of the matrix $ V$ was assigned to cluster $ j$ .

Steven Van Vaerenbergh
Last modified: 2007-10-17