Nicholas Hay, Stuart Russell, David Tolpin, Solomon Eyal Shimony
TL;DR本文提出了基于贝叶斯选择问题的概率框架中的元层决策,推导出在蒙特卡罗搜索中最优策略的有限采样界,并在一次性决策问题和 Go 中展示了比基于贝叶斯算法和基于赌博算法的启发式方法更优越的启发式近似。
Abstract
sequential decision problems are often approximately solvable by simulating
possible future action sequences. {\em Metalevel} decision procedures have been
developed for selecting {\em which} action sequences to simulate, based on
estimating the expected improvement in decision quality