next up previous
Next: Simulation results Up: Estimating the inverse nonlinear Previous: MLP model

Cost function and parameter training

In order to train the MLP weights and biases a cost function is designed that allows training blindly on the available nonlinear cluster data. The cost should be minimal when these data are mapped onto linear clusters by the entire set of MLPs, therefore ideally there should be a linear relationship between all components $ j$ and $ k$ within the same cluster $ i$:

$\displaystyle K_{j,k}^i = \frac{d_j^i(t)}{d_k^i(t)},\quad \forall t$ (5)

where $ K_{j,k}^i$ is the slope made up by the components $ j$ and $ k$ of cluster $ i$. Hence, a cost function to train the weights of MLP $ j$ can be derived as

$\displaystyle J_j = \sum_{i=1}^n \sum_{k=1}^m \sum_t \left[ d_j^i(t) - K_{j,k}^i d_k^i(t) \right]^2$ (6)

Notice that the elements of $ K^i$ must also be estimated and updated in each iteration, for instance by using the histogram-based estimator described in [8]. Fig. 1 shows the training diagram corresponding to the case $ m=2$.

The MLPs are initialized to get a linear input-output transformation: $ \textbf{d}^i(t) = \textbf{g}
\left(\textbf{c}^i(t) \right) = \textbf{c}^i(t)$. This initialization is relatively ``near'' the optimal solution and prevents the weights from converging to a trivial (all zeroes) solution. The parameters of the $ m$ MLPs are adapted in each iteration using a batch gradient descent approach to minimize (6). And, as suggested in [13], we also assume that they pass through the origin, i.e., $ g_j(0) = 0$; therefore the bias of the output layer $ b_{j,2}$ is fixed as $ b_{j,2}=-\textbf{w}_{j,2}^T
\phi\left(\textbf{b}_{j,1}\right)$. After this training the mixing matrix can be estimated in a straightforward way relying on the estimated $ K_{j,k}^i$.

Figure 2: Example of underdetermined BSS mixtures: scatter plots of three linear mixtures in (a) and three PNL mixtures with additive noise in (b), with $ p=0.9$ and $ 20$dB SNR. The preprocessing of Section III removes some samples of (b) to obtain (c), which is then used for spectral clustering. (d) shows the output of the MLPs after training with the clustered data.
Scatter plots of three linear mixtures. Scatter plots of three PNL mixtures with additive noise.
(a) (b)
Non-sparse samples used for spectral clustering. Output of the MLPs after training with the clustered data.
(c) (d)


next up previous
Next: Simulation results Up: Estimating the inverse nonlinear Previous: MLP model
Steven Van Vaerenbergh
Last modified: 2006-04-05