Selecting which basis should be removed

We will now address the selection of a basis for removal. Probably, the most sensible option would be to remove the basis $ i$ that minimizes the KL divergence between the exact and approximate posteriors    KL$ (p({\boldsymbol{\mathbf{f}}}_{t+1}\vert{\cal D}_{t+1}) \vert\vert p([{\boldsym...
...}}_{t+1}]_{-i}) p([{\boldsymbol{\mathbf{f}}}_{t+1}]_{-i}\vert{\cal D}_{t+1}) )
$, i.e., a measure of the information loss due to basis removal. This is exactly the same cost function used in the previous section. Unfortunately, this is computationally too expensive. A more practical option is to minimize the squared error (SE) induced by the approximation in the mean of the approximate posterior. In our experiments, this simpler cost function delivered almost identical performance compared to the KL-based criterion.

The SE can be computed by subtracting the means of the exact and approximate distributions after removing the $ i$-th basis, which is $ [0, \ldots, [{\boldsymbol{\mathbf{Q}}}_{t+1}{\boldsymbol{\mathbf{\mu}}}_{t+1}]_i / [{\boldsymbol{\mathbf{Q}}}_{t+1}]_{i,i},\ldots,0]^\top$; and then computing the Euclidean norm of this vector. Observe that only the posterior mean corresponding to the removed basis is distorted. With this result, we can easily check the SE of all basis in $ \mathcal{O}(M^2)$ time and flag for removal the basis that incurs in the least error. This is a well-known pruning criterion, used for instance in [10,12].

Thus, flagging a basis for removal and then effectively removing it from the posterior are operations that have the same cost as including new bases, rendering each complete KRLS iteration $ \mathcal{O}(M^2)$.

Pdf version (275 KB)
Steven Van Vaerenbergh
Last modified: 2011-09-20