Introduction

Kernel methods offer an attractive framework to deal with nonlinear signal processing problems [1]. By relying on a transformation of the data into a high-dimensional reproducing kernel Hilbert space, they are able to solve learning problems that are nonlinear in the input space as linear problems in the transformed space. Using the ``kernel trick'' efficient algorithms with algebraically simple expressions are obtained, such as support vector machines or kernel principal component analysis [1]. However, the functional representation of classical kernel-based algorithms grows linearly with the number of processed data, and therefore they are not directly suitable for online applications, where the complexity of each update must be limited [2].

The recursive least-squares (RLS) filter [3] is a popular algorithm that is used extensively in many engineering applications thanks to its good convergence and reasonably low complexity. In [4], a kernel recursive least-squares (KRLS) algorithm was proposed. To avoid an ``evergrowing'' functional representation of its solution, it features an online sparsification technique that allows to avoid redundancy in the solution's support. In particular, the support is reduced to a sparse ``dictionary'' of bases and a new basis is only added if it cannot be represented by a combination of other bases that are already present in the dictionary. A number of different criteria for online sparsification have since been explored in the literature [5], including an online pruning criterion [6].

An important difference between linear adaptive filters and their kernel-based counterparts has to do with their ability to handle non-stationary data. While most linear adaptive filters allow for tracking non-stationary data directly or through some straightforward extension of their standard form [3], such an extension is more complicated in the case of kernel-based algorithms. Most kernel adaptive filters are therefore designed to operate on stationary data only, and they converge approximately to the batch filtering solution [4,5]. As a result, tracking is an aspect of kernel adaptive filtering that has not been satisfactorily addressed yet in the literature.

In order to perform tracking, more weight should be given to more recent data. A radical approach to this idea is found in sliding-window algorithms, whose solution depends only on the latest $ M$ observed data [7,8], while any older data is discarded. This procedure is suboptimal for tracking time-varying data, since the quality of its solution depends on the bases present in its support, and all samples are given the same importance. A few attempts have also been made to allow tracking by extending the standard KRLS setting with a forgetting factor [5] or a simplified state-space model [9]. Nevertheless, while these algorithms theoretically allow for tracking, they are not capable of limiting dictionary growth at the same time. Furthermore, they present numerical difficulties that do not allow their practical implementation on finite-precision machines.

In this paper we explore a more principled approach to tracking. We specifically handle the uncertainty about the inferred input-output relationship, which we consider a latent function, and we study the problem of how older data should be forgotten. First, we define a probabilistic framework based on Gaussian processes (GPs) that offers an intuitive view of KRLS and allows to deal with uncertainty and regularization in a natural manner. Then, we propose a reasonable model for forgetting, which shifts the probability distribution over the input-output relationship towards the prior belief. In this manner, the solution converges to the prior belief once all data are forgotten, which is consistent with the Bayesian framework. The presented method is closely related to the work of Csató and Opper in [10], in which a GP perspective is adopted. We aim to bring their principled approach to the attention of the signal processing community, provide an (arguably) more intuitive sparse formulation, and extend it by including forgetting. The resulting KRLS algorithm is capable of tracking non-stationary data that exhibit nonlinear relationships. In order to guarantee online operation, we limit its memory and computational complexity in each step to $ \mathcal{O}(M^2)$, where $ M$ is the number of bases allowed in memory at any given time.

Pdf version (275 KB)
Steven Van Vaerenbergh
Last modified: 2011-09-20