/ 8 min read

Convolution of Gaussians and the Probit Integral

Gaussian distributions are very useful in Bayesian inference due to their (many!) convenient properties. In this post we take a look at two of them: the convolution of two Gaussian pdfs and the integral of the probit function w.r.t. a Gaussian measure.

Convolution and the Predictive Distribution of Gaussian Regression

Let’s start with the convolution of two Gaussians and on :

Proposition 1 (Convolution of Gaussians) Let and be two Gaussians on .

Proof. By the convolution theorem, the convolution of two functions is equivalent to the product of the functions’ Fourier transforms. The Fourier transform of a density function is given by its characteristic function. For a Gaussian , it is . Therefore, if and are the characteristic functions of and , respectively, then

which we can immediately identify as the characteristic function of a Gaussian with mean and variance .

This result is very useful in Bayesian machine learning, especially to obtain the predictive distribution of a Bayesian regression model. For instance, when one knows that the distribution over the regressor’s output is a Gaussian and we assume that the output is noisy .

Corollary 2 (Gaussian Regression). Let and are Gaussians on . Then,

Proof. First, notice that Gaussian is symmetric:

for , where is the normalizing constant. Using this, we can write the integral above as a convolution:

Thus, by Proposition 1, we have .

The Probit Integral and the Probit Approximation

The probit function is the cumulative distribution function of the standard Normal distribution on , i.e., . It can conveniently be written in terms of the error function

by

Proposition 3 (The Probit Integral). If be a Gaussian on and then

Proof. The standard property of the error function [2] says that

So,

This integral is very useful for Bayesian inference since it enables us to approximate the following integral that is ubiquitous in Bayesian binary classifications

where is the logistic function.

The key idea is to notice that the probit and logistic function are both sigmoid functions. That is, their graphs have a similar “S-shape”. Moreover, their images are both . However, they are a bit different—the probit function is more “horizontally stretched” compared to the logistic function.

So, the strategy to approximate the integral above is as follows: (i) horizontally “contract” the probit function and then (ii) use Proposition 3 to get an analytic approximation to the integral.

For the first step, this can be done by a simple change of coordinate: stretch the domain of the probit function with a constant , i.e., . There are several “good” values for , but commonly it is chosen to be , which makes the probit function have the same derivative as the logistic function at zero. That is, we have the approximation .

Corollary 4. If is a Gaussian on , then

Proof. By Proposition 3, we have

Now we are ready to obtain the final approximation, often called the probit approximation.

Proposition 5 (Probit Approximation) If is a Gaussian on and , then

Proof. Let . Using Corollary 4 and substituting :

Substituting into the last equation yields the desired result.

The probit approximation can also be used to obtain an approximation to the following integral, ubiquitous in multi-class classifications:

where the Gaussian is defined on and the softmax function is identified by its components for .

Proposition 6 (Multiclass Probit Approximation; Gibbs, 1998). If is a Gaussian on and , then

where the division in the r.h.s. is component-wise.

Proof. The proof is based on [3]. Notice that we can write the -th component of as . So, for each , using , we can write

Then, we use the following approximations (which admittedly might be quite loose):

  1. ,
  2. the mean-field approximation , and thus we have , and
  3. using the probit approximation (Proposition 5), with a further approximation

we obtain

We identify the last equation above as the -th component of .

References

  1. Ng, Edward W., and Murray Geller. “A table of integrals of the error functions.” Journal of Research of the National Bureau of Standards B 73, no. 1 (1969): 1-20.
  2. Gibbs, Mark N. Bayesian Gaussian processes for regression and classification. Dissertation, University of Cambridge, 1998.
  3. Lu, Zhiyun, Eugene Ie, and Fei Sha. “Mean-Field Approximation to Gaussian-Softmax Integral with Application to Uncertainty Estimation.” arXiv preprint arXiv:2006.07584 (2020).