BriefGPT.xyz
Oct, 2022
基于规划的探索:关于最优轨迹信息的研究
Exploration via Planning for Information about the Optimal Trajectory
HTML
PDF
Viraj Mehta, Ian Char, Joseph Abbate, Rory Conlin, Mark D. Boyer...
TL;DR
通过规划最大化任务最优轨迹的期望信息增益的行动序列,使得该方法在较低的样本量下能够学习较强的策略,比探索基线算法少用2倍样本,比模型自由方法少用200倍样本。
Abstract
Many potential applications of
reinforcement learning
(RL) are stymied by the large numbers of
samples
required to learn an effective
policy
→