next up previous
Next: Least-Squares Regression Up: A Sliding-Window Kernel RLS Previous: A Sliding-Window Kernel RLS


Introduction

In recent years a number of kernel methods, including support vector machines [1], kernel principal component analysis [2], kernel Fisher discriminant analysis [3] and kernel canonical correlation analysis [4,5] have been proposed and successfully applied to classification and nonlinear regression problems. In their original forms, most of these algorithms cannot be used to operate online since a number of difficulties are introduced by the kernel methods, such as the time and memory complexities (because of the growing kernel matrix) and the need to avoid overfitting of the problem.

Recently a kernel RLS algorithm was proposed that dealt with both difficulties [6]: by applying a sparsification procedure the kernel matrix size was limited and the order of the problem was reduced. In this paper we present a different approach, applying a sliding-window approach and conventional regularization. This way the size of the kernel matrix can be fixed rather than limited, allowing the algorithm to operate online in time-varying environments.

The basic idea of kernel methods is to transform the data $ \textbf{x}_i$ from the input space to a high dimensional feature space of vectors $ \Phi(\textbf{x}_i)$, where the inner products can be calculated using a positive definite kernel function satisfying Mercer's condition [1]: $ \kappa(\textbf{x}_i,\textbf{x}_j) = \langle
\Phi(\textbf{x}_i),\Phi(\textbf{x}_j) \rangle$. This simple and elegant idea, also known as the ``kernel trick'', allows inner products in the feature space to be computed without making direct reference to the feature vectors.

A common nonlinear kernel is the Gaussian kernel

$\displaystyle \kappa(\textbf{x},\textbf{y}) = \exp \left( -\frac{\Vert\textbf{x}-\textbf{y}\Vert^2}{2\sigma^2} \right).$ (1)

This kernel will be used to calculate the elements of a kernel matrix. In the sliding-window approach, updating this kernel matrix means first removing its first row and first column and then adding a new row and column at the end, based on new observations. Calculation of its inverse is needed to update the algorithm's solution. To this end, two matrix inversion formulas were derived in Appendix A. One of them has already been used in kernel methods [7] but in order to fix the dimensions of the kernel matrix we also introduce a complementary formula.

The rest of this paper is organized as follows: in Section 2 a kernel transformation is introduced into linear least-squares regression theory. A detailed description of the proposed algorithm is given in Section 3, and in Section 4 it is applied to a nonlinear channel identification problem. Finally, Section 5 summarizes the main conclusions of this work.


next up previous
Next: Least-Squares Regression Up: A Sliding-Window Kernel RLS Previous: A Sliding-Window Kernel RLS
Pdf version (187 KB)
Steven Van Vaerenbergh
Last modified: 2006-03-08