Suppose we have this distribution:
where
Now, how do we compute an expectation w.r.t. to
It is impossible for us to do this as we don’t know
Annealed Importance Sampling
The construction of AIS is as follows:
- Let
be our target distribution. - Let
be our proposal distribution which only requirement is that we can sample independent point from it. It doesn’t have to be close to thus the requirement is more relaxed than importance sampling. - Define a sequence of intermediate distributions starting from
to call it . The requirement is that whenever . That is, has to cover the support of so that we can take the ratio. - Define local transition probabilities
.
Then to sample from
- Sample an independent point from
. - Sample
from by doing MCMC w.r.t. . - Sample
from by doing MCMC w.r.t. . - Sample
from by doing MCMC w.r.t. .
Intuitively given two distributions, which might be disjoint in their support, we create intermediate distributions that are “bridging” from one to another. Then we do MCMC to move around these distributions and hope that we end up in our target distribution.
At this point, we have sequence of points
Notice that
With this importance weight, then we can compute the expectation as in importance sampling:
where
Practicalities
We now have the full algorithm. However several things are missing, namely, the choice of
For the intermediate distributions, we can set it as an annealing between to our target and proposal functions, i.e:
where
For the transition functions, we can use Metropolis-Hastings with acceptance probability:
assuming we have symmetric proposal, e.g.
Implementation
To make it more concrete, we can look at the simple implementation of AIS. We first define our target function:
Next we define our proposal function and distribution, as we assume we can easily sample independent points from it:
Lastly, we define our transition function:
Then, we are ready to do the sampling:
Notice, in the code above we do log-sum-exp trick to avoid underflow when computing
After the iteration finished, we have with ourselves our samples and their corresponding weights, from which we can compute the expectation as in importance sampling:
In this example, the result should be very close to the mean of our target Gaussian i.e.
Discussion
AIS is a very interesting and useful way to do importance sampling without having to sweat about the choice of proposal
Nevertheless, AIS is powerful and has been used even nowadays. It is a popular choice of algorithms for approximating partition function in RBM [2]. Moreover it has been used for Deep Generative Models (GAN, VAE) [3].
References
- Neal, Radford M. “Annealed importance sampling.” Statistics and computing 11.2 (2001): 125-139.
- Salakhutdinov, Ruslan. “Learning and evaluating Boltzmann machines.” Tech. Rep., Technical Report UTML TR 2008-002, Department of Computer Science, University of Toronto (2008). APA
- Wu, Yuhuai, et al. “On the quantitative analysis of decoder-based generative models.” arXiv preprint arXiv:1611.04273 (2016).