Reinforcement learning can effectively learn amortised design policies for
designing sequences of experiments. However, current methods rely on
contrastive estimators of expected information gain, which require an
exponential number of contrastive samples to achieve an unbiased estimation. We
propose an alternative lower bound estimator, based on the cross-entropy of the
joint model distribution and a flexible proposal distribution. This proposal
distribution approximates the true posterior of the model parameters given the
experimental history and the design policy. Our estimator requires no
contrastive samples, can achieve more accurate estimates of high information
gains, allows learning of superior design policies, and is compatible with
implicit probabilistic models. We assess our algorithm's performance in various
tasks, including continuous and discrete designs and explicit and implicit
likelihoods.

该研究提出了一种基于交叉熵的替代下界估计方法，其使用灵活的提议分布来近似模型参数的真实后验，不需要对比样本，并且可以在多种任务中实现更加准确的估计和学习。