Importance Sampling: A clever substitution of sampling region
MCI as a computational method was first initiated to solve the
integration problem in estimating expectations. Later it was applied to
the simulation of Bayesian posterior (BP) distributions. It has a
transparent algorithm: Generate random samples from a distribution
function say “Target”, then numerically calculate the integral by
summing the values. The expectation obtained by MCI can be referred to
as the Empirical Average (Supplementary Formula-1). For example,
imagine we want to estimate the expectation of the function\(h\left(x\right)=\sin\left(x\right)\sqrt{\left|\cos\left(x\right)\right|}\), where the random variable (X) follows a Normal distribution with
mean=0 and SD=5 (target). Firstly, generate 10000 random samples from
target distribution, then obtain the values of h(x) from each generated
sample, and then calculate numerically the mean and variance of this
generated sample (Supplementary R Code-1). This random sampling method
is not cost-effective when the target distribution is diffuse, because a
large sample size is required to obtain acceptable precision.
MCI was improved by IS, a variance reduction technique which was first
presented in statistical physics (1;
2). IS relaxes the procedure of treating
all parts of the distribution equally, concentrating instead on those
where estimation was critical. In this respect, an alternative function,
say “Proposal”, close to the target, is suggested by making an
educated guess. Contrary to MCI on which samples are treated evenly, a
“weight” which shows the importance of a sample is allocated to each
generated sample through an importance function. Actually, for each
sample, one calculates the likelihood of getting that sample from the
target distribution proportion to the likelihood of sampling it from the
proposal distribution. After the sampling process is finished, the
obtained relative likelihoods were normalized in a way that they sum to
one. In this way, each point has its own likelihood of occurrence as a
discrete probability distribution. The expectation obtained by IS calledWeighted Average (Supplementary Formula-2).
Consider a Normal (0, 0.05) and a t-student (dF=1) as the two different
proposal distributions for Normal (0, 5), and estimate importance
weights through the importance functions\(\frac{target:\ N(0,5)}{proposal:\ N\left(0,0.05\right)}\) and\(\ \frac{N(0,5)}{t(1)}\) for each generated point. After that,
normalize the weights as\(\frac{\frac{N(0,5)}{N\left(0,0.05\right)}}{\sum\frac{N(0,5)}{N\left(0,0.05\right)}}\)and \(\ \frac{\frac{N(0,5)}{t(1)}}{\sum\frac{N(0,5)}{t(1)}}\) .
Therefore, we have a discrete distribution function for which its
properties such as the mean and variance are easily estimable. Estimated
means for our generated samples were approximately 0 as obtained from
MCI but their variances are considerably lower than the MCI approach
(Supplementary R Code-2). Using alternative distributions can improve
the variance of the samples, although a wide proposal distribution leads
to worse estimates in terms of the variance and inefficient due to a
large sampling number (Figure1). Choosing an appropriate proposal
distribution that looked similar to the target would be ideal though
difficult to find at times. Unbiased estimates of parameters are
obtained for large samples by IS. It also works well when the importance
function is not very variable. Indeed, an appropriate proposal
distribution leads to lower variances and higher accuracy of
approximation. Robert and Casella provided an example illustrating the
use of Normal (0, 1) as a proposal to resemble sampling from a Cauchy
C(0,1) target distribution caused infinite variance of importance
weights (3). It leads to attach high
importance to few points and provides inefficient estimates in terms of
variance. By substituting heavy tails distributions like t-student
rather than Normal, reasonable fitness is guaranteed.