/ 8 min read
Convolution of Gaussians and the Probit Integral
Gaussian distributions are very useful in Bayesian inference due to their (many!) convenient properties. In this post we take a look at two of them: the convolution of two Gaussian pdfs and the integral of the probit function w.r.t. a Gaussian measure.
Convolution and the Predictive Distribution of Gaussian Regression
Let’s start with the convolution
Proposition 1 (Convolution of Gaussians) Let
Proof.
By the convolution theorem, the convolution of two functions is equivalent to the product of the functions’ Fourier transforms.
The Fourier transform of a density function is given by its characteristic function.
For a Gaussian
which we can immediately identify as the characteristic function of a Gaussian with mean
This result is very useful in Bayesian machine learning, especially to obtain the predictive distribution of a Bayesian regression model.
For instance, when one knows that the distribution over the regressor’s output is a Gaussian
Corollary 2 (Gaussian Regression). Let
Proof. First, notice that Gaussian is symmetric:
for
Thus, by Proposition 1, we have
The Probit Integral and the Probit Approximation
The probit function
by
Proposition 3 (The Probit Integral). If
Proof. The standard property of the error function [2] says that
So,
This integral is very useful for Bayesian inference since it enables us to approximate the following integral that is ubiquitous in Bayesian binary classifications
where
The key idea is to notice that the probit and logistic function are both sigmoid functions.
That is, their graphs have a similar “S-shape”.
Moreover, their images are both
So, the strategy to approximate the integral above is as follows: (i) horizontally “contract” the probit function and then (ii) use Proposition 3 to get an analytic approximation to the integral.
For the first step, this can be done by a simple change of coordinate: stretch the domain of the probit function with a constant
Corollary 4. If
Proof. By Proposition 3, we have
Now we are ready to obtain the final approximation, often called the probit approximation.
Proposition 5 (Probit Approximation) If
Proof.
Let
Substituting
The probit approximation can also be used to obtain an approximation to the following integral, ubiquitous in multi-class classifications:
where the Gaussian is defined on
Proposition 6 (Multiclass Probit Approximation; Gibbs, 1998). If
where the division in the r.h.s. is component-wise.
Proof.
The proof is based on [3].
Notice that we can write the
Then, we use the following approximations (which admittedly might be quite loose):
,- the mean-field approximation
, and thus we have , and - using the probit approximation (Proposition 5), with a further approximation
we obtain
We identify the last equation above as the
References
- Ng, Edward W., and Murray Geller. “A table of integrals of the error functions.” Journal of Research of the National Bureau of Standards B 73, no. 1 (1969): 1-20.
- Gibbs, Mark N. Bayesian Gaussian processes for regression and classification. Dissertation, University of Cambridge, 1998.
- Lu, Zhiyun, Eugene Ie, and Fei Sha. “Mean-Field Approximation to Gaussian-Softmax Integral with Application to Uncertainty Estimation.” arXiv preprint arXiv:2006.07584 (2020).