BriefGPT.xyz
Dec, 2019
蒙特卡罗树搜索用于策略优化
Monte-Carlo Tree Search for Policy Optimization
HTML
PDF
Xiaobai Ma, Katherine Driggs-Campbell, Zongzhang Zhang, Mykel J. Kochenderfer
TL;DR
本文提出了一种基于蒙特卡罗树搜索和无梯度优化的策略优化方法,称为MCTSPO,通过使用上界置信度启发式获得更好的探索-利用平衡,相对于基于梯度和深度遗传算法的基准,在具有欺骗性或稀疏奖励函数的强化学习任务中表现更佳。
Abstract
Gradient-based methods are often used for
policy optimization
in deep
reinforcement learning
, despite being vulnerable to local optima and saddle points. Although gradient-free methods (e.g., genetic algorithms o
→