Spectral clustering

Next: Preprocessing Up: Dividing the samples into Previous: Dividing the samples into

Spectral clustering

Spectral clustering [17] is a technique that clusters points based on a spectral analysis of the matrix of point-to-point similarities. This ``affinity'' matrix is formed as $F_{ij} = \exp(-d^2(\textbf{x}_i,\textbf{x}_j)/\sigma^2)$ where $d(\textbf{x}_i,\textbf{x}_j)$ is some distance measure between points $\textbf{x}_i$ and $\textbf{x}_j$ and $\sigma$ is a scaling constant. Spectral clustering obtains good clustering performance in cases where classic methods such as k-means fail, for instance when one data set is surrounded by another one. Nevertheless, its efficiency depends on the choice of $\sigma$ . In addition, since it requires obtaining the eigenvectors of an $N\times N$ matrix, care should be taken to reduce the computational cost when the number of samples

is large.

In [18], Zelnik-Manor and Perona proposed to calculate the affinity matrix as $F_{ij} = \exp(-d^2(\textbf{x}_i,\textbf{x}_j) / (\sigma_i\sigma_j))$ where the ``local scale'' $\sigma_i = d(\textbf{x}_i,\textbf{x}_L)$ is the distance between $\textbf{x}_i$ and its -th neighbor, with constant and depending on the data dimension. This has certain advantages over the original spectral clustering. Firstly, the results do not depend on the choice of $\sigma$ . Secondly, in the context of the underdetermined PNL BSS problem points with a higher local scale will likely correspond to multiple active sources, since the local scale is inversely proportional to the local density of points (see Fig. 2(a) and (b)).

Next: Preprocessing Up: Dividing the samples into Previous: Dividing the samples into

Steven Van Vaerenbergh
Last modified: 2006-04-05