Next: Online Algorithm Up: Online Kernel Canonical Correlation Previous: Regularization

Kernel CCA for Wiener Systems

**Figure 1:** A nonlinear Wiener system.

A Wiener system is a well-known and simple nonlinear system which consists of a cascade of a linear dynamic system and a memoryless nonlinearity (see Fig. 1). Such a nonlinear channel is encountered frequently, e.g. in digital satellite communications [22] or in digital magnetic recording [23]. Traditionally the problem of nonlinear equalization or identification has been tackled by considering nonlinear structures such as multilayer perceptrons (MLPs) [24], recurrent neural networks [25] or piecewise linear networks [26]. Most of the proposed techniques treated the Wiener system as a black-box, although use can be made of its simple structure.

Recently a supervised identification setup for Wiener systems was presented [6] that exploits the cascade structure by introducing joint identification of the linear filter and the inverse nonlinearity, as in Fig. 2. The estimator models for linear filter and nonlinearity are adjusted by minimizing the error between their outputs and . By doing so, $\hat{H}(z)$ will represent an estimator of , while will represent $f^{-1}(.)$ in the noiseless case, assuming that is invertible in the output data range.

**Figure 2:** The used Wiener system identification diagram.

To avoid the trivial zero solution or divergence of the estimators, a restriction should be taken into account for this approach to work. The two most obvious options are imposing restrictions on

the norm of the estimator coefficients
the norm of the signals and .

The first type of restriction was used in [6]. Interestingly, the second type is a direct application of (kernel) CCA, as can be seen from Eq. (2). K-CCA can be applied to this problem by maximizing the correlation between the linear projection

(linear kernel, $\kappa(\textbf{x},\textbf{y}) = \textbf{x}^T\textbf{y}$ ) of the system input

and the nonlinear projection

(gaussian or other nonlinear kernel) of the system output

. To prevent overfitting, the nonlinear kernel matrix is regularized as mentioned in Section II-C. However, regularization is not needed for the linear kernel, since the feature space of a linear kernel is the original data space, with dimension

. Using this insight the linear kernel matrix can be replaced by an estimate of the correlation matrix, as shown below. Starting from the general form of the GEV problem (7)

$\displaystyle \frac{1}{2} \begin{bmatrix}\textbf{K}_1 & \textbf{K}_2\\ \textbf{... ...h$\alpha$\unboldmath }_1\\ \mbox{\boldmath$\alpha$\unboldmath }_2\end{bmatrix},$

(8)

and substituting $\textbf{K}_1 = \textbf{X}_1 \textbf{X}_1^T$ and $\textbf{w}_1 = \textbf{X}_1^T$ $\alpha$

, (8) is equivalent to solving

$\displaystyle \frac{1}{2} \begin{bmatrix}\textbf{X}_1^T \textbf{X}_1 & \textbf{... ...egin{bmatrix}\textbf{w}_1\\ \mbox{\boldmath$\alpha$\unboldmath }_2\end{bmatrix}$

where $\textbf{w}_1$ represents the solution to the primal problem, for the linear part, and $\alpha$

represents the solution to the dual problem, for the nonlinear part. Substituting $\textbf{y} = \frac{1}{2} (\textbf{y}_1 + \textbf{y}_2) = \frac{1}{2} (\textbf{X}_1 \textbf{w}_1 + \textbf{K}_2$ $\alpha$

, this GEV constitutes two coupled LS regression problems:

$\displaystyle \left\{\begin{array}{l} \beta \textbf{w}_1 = (\textbf{X}_1^T \tex... ...boldmath }_2 = (\textbf{K}_2 + c\textbf{I})^{-1} \textbf{y} \end{array} \right.$

(9)

Next: Online Algorithm Up: Online Kernel Canonical Correlation Previous: Regularization

Steven Van Vaerenbergh
Last modified: 2006-04-05