学习即规划：通过蒙特卡罗树搜索实现接近Bayes最优强化学习

Feb, 2012

学习即规划：通过蒙特卡罗树搜索实现接近Bayes最优强化学习

Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search

John Asmuth, Michael L. Littman

TL;DR使用前向搜索稀疏采样算法（FSSS）可以实现接近 Bayes 最优行为，从而使用 Monte-Carlo 树搜索算法有效地处理状态空间极大或无限大的马尔可夫决策过程（MDPs）。

Abstract

bayes-optimal behavior, while well-defined, is often difficult to achieve. Recent advances in the use of monte-carlo tree search (MCTS) have shown that it is possible to act near-optimally in →