Title:  Variational algorithms for Bayesian inference in latent Gaussian models 
Author(s):  Cseke, B. 
Publication year:  2011 
Publisher:  [S.l. : s.n.] 
ISBN:  9789090257501 
Number of Pages:  114 p. 
Annotation:  Promotor : Hoop, H. de Copromotor : Zwarts, J. Copromotor : Heskes, T.M. 
Publication type:  Dissertation 
Please use this identifier to cite or link to this item : https://hdl.handle.net/2066/83218 

Display more details 

Subject:  Data Science 
Organization:  Data Science 
Abstract: 
The results in this thesis are based on applications of the expectation propagation algorithm to approximate marginals in models with Gaussian prior densities. A short introduction of variational methods in Bayesian inference is given in Chapter 1.
In Chapter 2, we start out with a model where both the prior and the likelihood is Gaussian and study the properties of the message passing algorithm and the corresponding Bethe free energy. It turns out that although in terms of functional parameters qk and qij the free energy has the same property as in the discrete case, when expressing it in a parametric form that incorporates the marginal consistency constraints (as in (1.3)), its behavior is quite surprising. While in the discrete case the free energy is a bounded func tion, in the Gaussian case it can be unbounded when expressed in terms of the moment parameters of the approximate marginals. The typical relaxations applied in the discrete case (e.g. Wainwright et al., 2003; Wiegerinck and Heskes, 2003) seem to achieve the opposite effect by creating a convex objective with an unbounded global minimum. We show that the stable fixed points of the Gaussian message passing algorithm are local minima of the Gaussian free energy and that both the convergence of the message passing algorithm and the existence of local minima is more likely for relaxation parameters that move the free energy closer to the mean field free energy. We also give sufficient and necessary conditions for the boundedness of the Gaussian Bethe free energy.
In Chapter 3, we address the problem of approximating posterior marginals in models where p(xM) is a Gaussian prior and the nonGaussian likelihood factorizes into a product of terms depending on one variable only. The methods we propose are not restricted to these models, but they are particularly wellsuited for them. The approximate posterior marginals in these models are typically computed by approximating the nonGaussian posterior density with a multivariate Gaussian density either using the variational objective or by expectation propagation. The Gaussian marginals are then used as approximations of the posterior marginals. In Chapter 3 we go beyond these Gaussian marginal approximations and we derive a framework to improve on the Gaussian marginal approximations. The improved marginals seem to perform well in the comfort zone of EP, that is, in mod els where the posterior density is logconcave. Although we do not provide an estimate or an upper bound on the error, the approximations have the nice property that they can be gradually improved whenever better accuracy is needed.
In Chapter 4, we define a multivariate scale mixture distribution that can be used as a sparsifying prior in the context of linear regression and logistic regression. We derive an efficient expectation propagation algorithm to do approximate inference in these models. We use these models to do approximate Bayesian inference for assessing the activation of brain areas in taskrelated MEG and fMRI experiments. The multivariate prior we in troduce is based on the scale mixture representation of the univariate double exponential prior and it is defined with the aim to introduce prior correlations between the magni tudes of the regression coefficients. This was motivated by the observation that in many MEG and fMRI applications the activations have smooth spatial and temporal patterns, that is, neighboring brain areas (in space, in time, or both) are likely to have similar ac tivation levels. The prior keeps the regression coefficients a priori uncorrelated, but it correlates their magnitudes. The symmetry properties of the prior lead to posterior den sities that imply block diagonal correlation structures. The approximating multivariate Gaussians inherit this property. The block diagonal covariance structure and the typically underdetermined regression models make the computational complexity of EP to scale linearly with the number of regression coefficients. We show that the importance maps created from the approximate posterior moments of the scale parameters are meaningful and neurobiologically reasonable. Proefschrift Dissertation
