After several inclusion-deletion steps, all information available up to time has (approximately) been stored in the posterior density over the dictionary bases
. Inserting this
in Eq. (2) and solving the integral, we can obtain the implied posterior GP over the whole input space,
In order to make KRLS able to adapt to non-stationary environments, we should make it able to ``forget'' past samples, i.e., to intentionally force the posterior
to lose some information. A very general approach to this is to linearly combine
with another independent GP
that holds no information about data. Since this new posterior after forgetting will be a linear combination of two GPs, it will also be a GP, and we will denote it as
The posterior GP after forgetting,
, should be expressible in terms of a distribution over the latent points in the dictionary (to avoid a budget increase). We will refer to this distribution as
. Using Eq. (2) again, the posterior after forgetting in terms of
and
is
Different definitions for ,
and
will result in different types of forgetting. One reasonable approach is discussed next.