next up previous
Next: Kernel Canonical Correlation Analysis Up: CCA and Kernel CCA Previous: CCA and Kernel CCA


Canonical Correlation Analysis

Given two full-rank data matrices $ \textbf{X}_1 \in \mathbb{R}^{N
\times m_1}$ and $ \textbf{X}_2 \in \mathbb{R}^{N \times m_2}$, canonical correlation analysis (CCA) is defined as the problem of finding two canonical vectors $ \textbf{h}_1 \in \mathbb{R}^{m_1
\times 1}$ and $ \textbf{h}_2 \in \mathbb{R}^{m_2 \times 1}$ that maximize the correlation between the canonical variates $ \textbf{y}_1 = \textbf{X}_1 \textbf{h}_1$ and $ \textbf{y}_2 =
\textbf{X}_2 \textbf{h}_2$, i.e.

$\displaystyle \mathop{\text{argmax}}_{\textbf{h}_1,\textbf{h}_2} \rho = \frac{\...
...1^T \textbf{R}_{11} \textbf{h}_1 \textbf{h}_2^T \textbf{R}_{22} \textbf{h}_2}},$ (1)

or equivalently

$\displaystyle \mathop{\text{argmax}}_{\textbf{h}_1,\textbf{h}_2} \rho = \textbf...
...bf{y}_2 \quad \text{s.t.}\quad \Vert\textbf{y}_1\Vert=\Vert\textbf{y}_2\Vert=1,$    

where $ \textbf{R}_{kl} = \textbf{X}_1^T \textbf{X}_2$ is an estimate of the cross-correlation matrix. An alternative formulation of CCA into the framework of least squares (LS) regression has been proposed in [16,17]. Specifically, it has been proved that CCA can be reformulated as the problem of minimizing

$\displaystyle J = \frac{1}{2} \Vert\textbf{X}_1 \textbf{h}_1 - \textbf{X}_2 \te...
...s.t.} \quad \frac{1}{2} \sum_{k=1}^2 \Vert\textbf{X}_k \textbf{h}_k\Vert^2 = 1,$ (2)

and solving (1) or (2) by the method of Lagrange multipliers, CCA can be rewritten as the following generalized eigenvalue (GEV) problem

$\displaystyle \frac{1}{2} \begin{bmatrix}\textbf{X}_1^T \textbf{X}_1 & \textbf{...
...\textbf{0}\\ \textbf{0} & \textbf{X}_2^T \textbf{X}_2 \end{bmatrix} \textbf{h},$ (3)

where $ \textbf{h} = [\textbf{h}_1^T \textbf{h}_2^T]^T$, and $ \beta=(\rho+1)/2$ is a parameter related to a principal component analysis (PCA) interpretation of CCA [17].

The solution of (3) can be directly obtained applying standard GEV algorithms. However, the special structure of the CCA problem has been recently exploited to obtain efficient CCA algorithms [16,17,18]. Specifically, denoting the pseudoinverse of $ {\mathbf X}_k$ as $ {\mathbf X}^+_k =
({\mathbf X}_k^H {\mathbf X}_k)^{-1}{\mathbf X}_k^H$, the GEV problem (3) can be viewed as two coupled LS regression problems

$\displaystyle \beta {\mathbf h}_k = {\mathbf X}_k^+ {\mathbf y}, \quad k=1,2,$    

where $ {\mathbf y}= ({\mathbf y}_1+{\mathbf y}_2)/2$. This idea has been used in [16,17] to develop an algorithm based on the solution of these regression problems iteratively: at each iteration $ t$ two LS regression problems are solved using

$\displaystyle {\mathbf y}(t) = \frac{{\mathbf y}_1(t) + {\mathbf y}_2(t)}{2} = \frac{{\mathbf X}_1 {\mathbf h}_1(t-1) + {\mathbf X}_2 {\mathbf h}_2(t-1)}{2}$    

as desired output. Furthermore, this LS regression framework has been exploited to develop adaptive CCA algorithms based on the recursive least-squares algorithm (RLS) [16,17].


next up previous
Next: Kernel Canonical Correlation Analysis Up: CCA and Kernel CCA Previous: CCA and Kernel CCA
Steven Van Vaerenbergh
Last modified: 2006-04-05