Spectral clustering
Spectral clustering [9] is a recently proposed successful
method rooted in graph theory [11], capable of clustering
data based on point-to-point similarities. For most problems,
``similarity'' is measured as the distance between data points,
defining clusters as connected zones of points and making it
possible to easily cluster non-convex data sets, as illustrated by
Fig. 2.
Figure 2:
(a) Two sets of intertwined data points, difficult or
impossible to cluster with conventional clustering algorithms. (b)
Spectral clustering easily divides the points in two separate
groups, based on the principle that two points should be in the same
group if they are close to each other.
 |
 |
(a) |
(b) |
|
The similarity between two points
and
is measured through a kernel function
such as
![$\displaystyle \kappa(\textbf{x}[i],\textbf{x}[j]) =
\exp\left(-\frac{d^2(\textbf{x}[i],\textbf{x}[j])}{\sigma^2}\right)$](img30.png) |
(3) |
where
is some distance measure
between points
and
and
is
the kernel size. This kernel function is almost
for points that
are close to each other, and lowers as the distance rises.
Given a set of
points
, a similarity matrix (also called ``affinity'' or
kernel matrix) can be defined as
. Clustering is performed by
analyzing the spectrum of that matrix. One of the most successful
spectral clustering algorithms is the Ng-Jordan-Weiss (NJW)
algorithm, introduced in [9]. It can be summarized as
follows:
- Calculate the affinity matrix A using
(3), and set
for
.
- Obtain
, where
is a diagonal matrix with
. This normalization will assure that all clusters have more or less equal size.
- Form the matrix
where
are the
largest eigenvectors of
and
is the number of
subsets to retrieve.
- Treat the rows of
as points in
, and
normalize them to unit length. These points correspond to the
original points
but form compact clusters now. Cluster them
with an algorithm such as k-means.
- Assign the original point
to cluster
if and only
if row
of the matrix
was assigned to
cluster
.
Steven Van Vaerenbergh
Last modified: 2007-10-17