next up previous
Next: The Online Algorithm Up: Least-Squares Regression Previous: Kernel Methods

Measures Against Overfitting

For most useful kernel functions, the dimension of the feature space, $ M'$, will be much higher than the number of available data points $ N$ (for instance, in case the Gaussian kernel is used the feature space will have dimension $ M' = \infty$). In these cases, Eq. (5) could have an infinite number of solutions, representing an overfit problem.

Various techniques to handle this overfitting have been presented. One possible method is to reduce the order of the feature space [6,4,5]. A second method, used here, is to regularize the solution. More specifically, the norm of the solution $ \textbf{\~h}$ is penalized to obtain the following problem:

$\displaystyle J''$ $\displaystyle =$ $\displaystyle \min_{\textbf{\~h}} \Vert\textbf{y} - \textbf{\~X}
\textbf{\~h} \Vert^2 + c\Vert\textbf{\~h}\Vert^2$ (7)
  $\displaystyle =$ $\displaystyle \min_{\mbox{\boldmath$\alpha$\unboldmath }} \Vert\textbf{y} -
\te...
...\boldmath$\alpha$\unboldmath }^T\textbf{K} \mbox{\boldmath$\alpha$\unboldmath }$ (8)

whose solution is given by

$\displaystyle \mbox{\boldmath$\alpha$\unboldmath }$$\displaystyle = \textbf{K}_{reg}^{-1} \textbf{y}$ (9)

with $ \textbf{K}_{reg} = \left(\textbf{K} + c\textbf{I}\right)$, $ c$ a regularization constant and $ \textbf{I}$ the identity matrix.



Pdf version (187 KB)
Steven Van Vaerenbergh
Last modified: 2006-03-08