next up previous
Next: Preprocessing Up: Dividing the samples into Previous: Dividing the samples into

Spectral clustering

Spectral clustering [17] is a technique that clusters points based on a spectral analysis of the matrix of point-to-point similarities. This ``affinity'' matrix is formed as $ F_{ij} = \exp(-d^2(\textbf{x}_i,\textbf{x}_j)/\sigma^2)$ where $ d(\textbf{x}_i,\textbf{x}_j)$ is some distance measure between points $ \textbf{x}_i$ and $ \textbf{x}_j$ and $ \sigma$ is a scaling constant. Spectral clustering obtains good clustering performance in cases where classic methods such as k-means fail, for instance when one data set is surrounded by another one. Nevertheless, its efficiency depends on the choice of $ \sigma$. In addition, since it requires obtaining the eigenvectors of an $ N\times N$ matrix, care should be taken to reduce the computational cost when the number of samples $ N$ is large.

In [18], Zelnik-Manor and Perona proposed to calculate the affinity matrix as $ F_{ij} =
\exp(-d^2(\textbf{x}_i,\textbf{x}_j) / (\sigma_i\sigma_j))$ where the ``local scale'' $ \sigma_i = d(\textbf{x}_i,\textbf{x}_L)$ is the distance between $ \textbf{x}_i$ and its $ L$-th neighbor, with $ L$ constant and depending on the data dimension. This has certain advantages over the original spectral clustering. Firstly, the results do not depend on the choice of $ \sigma$. Secondly, in the context of the underdetermined PNL BSS problem points with a higher local scale will likely correspond to multiple active sources, since the local scale is inversely proportional to the local density of points (see Fig. 2(a) and (b)).


next up previous
Next: Preprocessing Up: Dividing the samples into Previous: Dividing the samples into
Steven Van Vaerenbergh
Last modified: 2006-04-05