![]() |
An anisotropic Gaussian kernel was used, in which the hyperparameters were determined off-line by standard GP regression. In particular, the noise-to-signal ratio (regularization) was
. The first algorithm was SW-KRLS with a window of
data points. The second algorithm was ALD-KRLS with sensitivity
. For this parameter value, the final dictionary contained exactly
bases. We also ran ALD-KRLS with
and stopped its dictionary expansion once it contained
bases. While after this point the dictionary is left unchanged, ALD-KRLS continues adaptation by performing reduced updates of its other parameters. The last algorithm is the proposed KRLS-T algorithm, with a dictionary size of
bases and
. To prune the dictionary, we used the slower criterion that minimizes KL-divergence (``fullKL'') in one test and the simpler MSE-based criterion in another test.
Each algorithm performed a single run over the data. The performance was measured by calculating the normalized mean-square error (NMSE) on the test set, at different points throughout the training run. The results are displayed in Fig. 1. During the first iterations, four of the algorithms show identical performance, since they accept every observed data point into their dictionaries. The fifth algorithm, ALD-KRLS with
, has a slower dictionary growth and an initially larger error. From iteration
onwards, SW-KRLS maintains its performance, since it performs regression only on the
most recent samples. ALD-KRLS selects the most relevant bases (as a growing set), and therefore it converges to a better NMSE. While its sensitivity level
results in a slower convergence, it achieves similar results as
. The KRLS-T algorithm outperforms all others, since it is able to properly weight all samples and to trade weaker bases in the dictionary for more relevant ones during the entire training process. Interestingly, it obtains similar results with the simple MSE-based pruning criterion and the computationally more expensive KL-based criterion.