The adopted setup corresponds to standard GP regression, which is thoroughly described in [11] for the batch setting. Here we consider the online setting in which new observations are incorporated sequentially. Therefore, a new posterior
including the most recent observation
must be computed at each time instant. We will compute the joint posterior only at the locations at which data is observed, instead of the full GP posterior. This is because
implicitly defines the posterior full GP distribution
Instead of recomputing
from scratch at every time instant, we can obtain a simple recursive update as follows:
Eq. (3b) follows from (3a) by direct application of Bayes rule. Eq. (3c) includes an additional expansion due to being conditionally independent of
given
.
If the posterior at time is a known Gaussian
, then we can use (3) to update it to include a new observation
. All the expression in (3) can be easily inferred from the stated assumptions as follows.
First, introducing the new quantities
,
and
,
where
and
,
we can express the density of the latent function at the new input given its value at previous inputs as
Then, the denominator of Eq. (3), which corresponds to the marginal likelihood (also known as evidence), provides the predictive distribution of a new observation given past data, and can be computed by integration of the numerator
Finally, the likelihood follows from the model definition (1):
Thus, all involved distributions (4) and (5), (6) are univariate normal with simple, known expressions. Using those, (3) can be evaluated and the posterior updates emerge
The inverse of the kernel matrix including the new input can also be easily computed using the corresponding low-rank update
This illustrates both how probabilistic predictions for new observations can be made (using Eq. (5), which does not require knowledge of ), and how these new observations can be included in the posterior once they are available. All computations for a given update can be made in
time, as is obvious from the update formulas. Only
,
and
will be reused in the next iteration, and the remaining quantities will be computed on demand.
The recursion updates can be initialized by setting
![]() |
![]() |
(9) |
![]() |
![]() |
(10) |
![]() |
![]() |
(11) |
Since the model is exactly that of GP regression and all provided formulas are exact, probabilistic predictions made at time for observation
are exactly the same as those obtained using a standard GP in the batch setting. Using the batch formulation from [11], we can equivalently express the predictive mean and variance from (5) as
In the standard KRLS setting, the predictive mean is often expressed as
, where
weights each kernel. When the batch formulation is used, these weights can be obtained as
, whereas in our recursive formulation the same result can be obtained at each step
by computing
. Observe the resemblance between the batch and recursive formulations: In the batch formulation we are using noisy observations
, so the kernel matrix includes a regularization term
. In the recursive formulation we use
, which are the values of the noiseless function evaluated at the inputs, so no noise term is added. Obviously, the same value for
is obtained with both formulations.