TL;DR该研究通过合理连接基于 MCTS 的两种不同种类算法来实现在短时间内寻求合理 good action,同时保持 BRUE 算法的优秀的收敛性能和指数级性能提高的保障。
Abstract
Popular monte-carlo tree search (MCTS) algorithms for online planning, such
as epsilon-greedy tree search and UCT, aim at rapidly identifying a reasonably
good action, but provide rather poor worst-case guarantees on performance
improvement over time. In contrast, a recently introduced