Importance sampling
This post is about importance sampling. For importance sampling, although it seems to be a sampling method as its name suggests, it is actually a method for estimating the expectation E of some certain function f(z) w.r.t a probability distribution p(z).
E(f)=∫p(z)f(z)dzOne simple strategy is to take discretised z values in z-space uniformly and the expectation above is approximated by
E(f)≈L∑l=1p(z(l))f(z(l))However, it is inefficient because small value of p(z(l))f(z(l)) attributes few for the expectation E(f). Only a small region of z makes contribution to the sum. As the dimensionality of z grows, the situation will get worse.
As we do in rejection-sampling, in importance sampling we also have propose distribution q(z), which is easy to take samples from it. The expectation E(f) is reformulated.
E(f)=∫f(z)p(z)dz=∫f(z)p(z)q(z)q(z)dz≈1LL∑l=1p(z(l))q(z(l))f(z(l))In this approximation, the samples z for calculating E(f) are from propose distribution q(z). Under the assumption that we can easily evaulate ˜p(z)=Zpp(z) and ˜q(z)=Zqq(z), we reformulate the approximation of E(f) again.
1LL∑l=1p(z(l))q(z(l))f(z(l))=L∑l=1Zp∗˜p(z(l))Zq∗˜q(z(l))f(z(l))=ZqZpL∑l=1~rlf(z(l))=∑Ll=1~rlf(z(l))Zp/Zqwhere ~rl=˜p(z(l))˜q(z(l)).
We call rl importance weight. For Zp/Zq, we have
ZpZq=∫~p(z)Zqq(z)q(z)dz=∫~p(z)~q(z)q(z)dz≈1LL∑l=1~rlBased on the approximation of Zp/Zq and E(f), we have
E(f)≈∑Ll=1~rlf(z(l))∑Ll=1~rl≈L∑l=1wlf(z(l))where wl=˜rl∑Lm=1˜rm.
One remark is that when using importance sampling, the propose distribution q(z) is important. If the region where p(z)∗f(z) has large value doesn’t match the region where samples from q(z) are concentrated, the approximation may be quite wrong. For example, f(z)=e|z|, p(z)∼N(0,32), q(z)∼N(10,12) and Zp=4,Zq=2. The figure below shows the general overview.
The approximation of using importance sampling is the y-axis value of the red point in the figure. It is also the title of the figure. The orange points are the sampled points z from q(z) and their corresponding f(z).
If we choose q(z)∼N(0,12), then the approximation is much better.
More programming details are in this notebook
Based on the rl importance weights, there exists a method for sampling from desired distribution p(x). It is called sampling-importance-resampling. It has a very important real-world application, particle filter. Since it is not a small topic, I am going to write another post about sampling-importance-resampling with some interesting applications using particle filter.