Several techniques have been proposed to curb this growth, including the ALD criterion [3], the surprise information measure [4] and sliding-window
techniques [11,12,13]. These methods assemble a limited dictionary of input-output patterns
which are used to construct the
nonlinear mapping (2). In order to obtain a confident estimate, these patterns should represent the complete input-output data distribution sufficiently well. Simultaneously to this
selection process, KRLS performs kernel least-squares regression on these patterns to
obtain the optimal nonlinear mapping.
The proposed method builds upon ideas presented in [12,13], in which the memory size is fixed to patterns. However, it takes a more active role in the
building of the
dictionary. Specifically, in every iteration it first adds a new point to the memory, and then it determines the least relevant data point present in the memory, which is subsequently pruned.
The result of this active learning strategy is that at any time instant the memory will contain only the
most relevant patterns seen up till that moment.