Processing math: 100%

Importance sampling

This post is about importance sampling. For importance sampling, although it seems to be a sampling method as its name suggests, it is actually a method for estimating the expectation E of some certain function f(z) w.r.t a probability distribution p(z).

E(f)=p(z)f(z)dz

One simple strategy is to take discretised z values in z-space uniformly and the expectation above is approximated by

E(f)Ll=1p(z(l))f(z(l))

However, it is inefficient because small value of p(z(l))f(z(l)) attributes few for the expectation E(f). Only a small region of z makes contribution to the sum. As the dimensionality of z grows, the situation will get worse.

As we do in rejection-sampling, in importance sampling we also have propose distribution q(z), which is easy to take samples from it. The expectation E(f) is reformulated.

E(f)=f(z)p(z)dz=f(z)p(z)q(z)q(z)dz1LLl=1p(z(l))q(z(l))f(z(l))

In this approximation, the samples z for calculating E(f) are from propose distribution q(z). Under the assumption that we can easily evaulate ˜p(z)=Zpp(z) and ˜q(z)=Zqq(z), we reformulate the approximation of E(f) again.

1LLl=1p(z(l))q(z(l))f(z(l))=Ll=1Zp˜p(z(l))Zq˜q(z(l))f(z(l))=ZqZpLl=1~rlf(z(l))=Ll=1~rlf(z(l))Zp/Zq

where ~rl=˜p(z(l))˜q(z(l)).

We call rl importance weight. For Zp/Zq, we have

ZpZq=~p(z)Zqq(z)q(z)dz=~p(z)~q(z)q(z)dz1LLl=1~rl

Based on the approximation of Zp/Zq and E(f), we have

E(f)Ll=1~rlf(z(l))Ll=1~rlLl=1wlf(z(l))

where wl=˜rlLm=1˜rm.

One remark is that when using importance sampling, the propose distribution q(z) is important. If the region where p(z)f(z) has large value doesn’t match the region where samples from q(z) are concentrated, the approximation may be quite wrong. For example, f(z)=e|z|, p(z)N(0,32), q(z)N(10,12) and Zp=4,Zq=2. The figure below shows the general overview.

overview

The approximation of using importance sampling is the y-axis value of the red point in the figure. It is also the title of the figure. The orange points are the sampled points z from q(z) and their corresponding f(z).

wrong_result

If we choose q(z)N(0,12), then the approximation is much better.

better_result

More programming details are in this notebook

Based on the rl importance weights, there exists a method for sampling from desired distribution p(x). It is called sampling-importance-resampling. It has a very important real-world application, particle filter. Since it is not a small topic, I am going to write another post about sampling-importance-resampling with some interesting applications using particle filter.

Written on November 17, 2018