BriefGPT.xyz
Jun, 2012
用于马尔可夫决策过程在线规划的简单遗憾优化
Online Planning in MDPs: Rationality and Optimization
HTML
PDF
Zohar Feldman, Carmel Domshlak
TL;DR
本针对MDPs中的在线规划问题,提出一种基于MCTS2e的新型蒙特卡罗树搜索算法BRUE,其能够以指数速度降低简单遗憾和错误概率,并配合遗忘学习进行推广。结果表明,BRUE不仅提供了优越的性能保证,而且在实践中也非常有效。
Abstract
We consider
online planning
in
markov decision processes
. An algorithm for this problem should explore the set of possible policies from the current state, and, when interrupted, recommend an action to follow bas
→