Canonical Correlation Analysis

Next: Kernel Canonical Correlation Analysis Up: CCA and Kernel CCA Previous: CCA and Kernel CCA

Canonical Correlation Analysis

Given two full-rank data matrices $\textbf{X}_1 \in \mathbb{R}^{N \times m_1}$ and $\textbf{X}_2 \in \mathbb{R}^{N \times m_2}$ , canonical correlation analysis (CCA) is defined as the problem of finding two canonical vectors $\textbf{h}_1 \in \mathbb{R}^{m_1 \times 1}$ and $\textbf{h}_2 \in \mathbb{R}^{m_2 \times 1}$ that maximize the correlation between the canonical variates $\textbf{y}_1 = \textbf{X}_1 \textbf{h}_1$ and $\textbf{y}_2 = \textbf{X}_2 \textbf{h}_2$ , i.e.

$\displaystyle \mathop{\text{argmax}}_{\textbf{h}_1,\textbf{h}_2} \rho = \frac{\... ...1^T \textbf{R}_{11} \textbf{h}_1 \textbf{h}_2^T \textbf{R}_{22} \textbf{h}_2}},$

(1)

or equivalently

$\displaystyle \mathop{\text{argmax}}_{\textbf{h}_1,\textbf{h}_2} \rho = \textbf... ...bf{y}_2 \quad \text{s.t.}\quad \Vert\textbf{y}_1\Vert=\Vert\textbf{y}_2\Vert=1,$

where $\textbf{R}_{kl} = \textbf{X}_1^T \textbf{X}_2$ is an estimate of the cross-correlation matrix. An alternative formulation of CCA into the framework of least squares (LS) regression has been proposed in [16,17]. Specifically, it has been proved that CCA can be reformulated as the problem of minimizing

$\displaystyle J = \frac{1}{2} \Vert\textbf{X}_1 \textbf{h}_1 - \textbf{X}_2 \te... ...s.t.} \quad \frac{1}{2} \sum_{k=1}^2 \Vert\textbf{X}_k \textbf{h}_k\Vert^2 = 1,$

(2)

and solving (1) or (2) by the method of Lagrange multipliers, CCA can be rewritten as the following generalized eigenvalue (GEV) problem

$\displaystyle \frac{1}{2} \begin{bmatrix}\textbf{X}_1^T \textbf{X}_1 & \textbf{... ...\textbf{0}\\ \textbf{0} & \textbf{X}_2^T \textbf{X}_2 \end{bmatrix} \textbf{h},$

(3)

where $\textbf{h} = [\textbf{h}_1^T \textbf{h}_2^T]^T$ , and $\beta=(\rho+1)/2$ is a parameter related to a principal component analysis (PCA) interpretation of CCA [17].

The solution of (3) can be directly obtained applying standard GEV algorithms. However, the special structure of the CCA problem has been recently exploited to obtain efficient CCA algorithms [16,17,18]. Specifically, denoting the pseudoinverse of ${\mathbf X}_k$ as ${\mathbf X}^+_k = ({\mathbf X}_k^H {\mathbf X}_k)^{-1}{\mathbf X}_k^H$ , the GEV problem (3) can be viewed as two coupled LS regression problems

$\displaystyle \beta {\mathbf h}_k = {\mathbf X}_k^+ {\mathbf y}, \quad k=1,2,$

where ${\mathbf y}= ({\mathbf y}_1+{\mathbf y}_2)/2$ . This idea has been used in [16,17] to develop an algorithm based on the solution of these regression problems iteratively: at each iteration

two LS regression problems are solved using

$\displaystyle {\mathbf y}(t) = \frac{{\mathbf y}_1(t) + {\mathbf y}_2(t)}{2} = \frac{{\mathbf X}_1 {\mathbf h}_1(t-1) + {\mathbf X}_2 {\mathbf h}_2(t-1)}{2}$

as desired output. Furthermore, this LS regression framework has been exploited to develop adaptive CCA algorithms based on the recursive least-squares algorithm (RLS) [16,17].

Next: Kernel Canonical Correlation Analysis Up: CCA and Kernel CCA Previous: CCA and Kernel CCA

Steven Van Vaerenbergh
Last modified: 2006-04-05