Philip Ball, Jack Parker-Holder, Aldo Pacchiano, Krzysztof Choromanski, Stephen Roberts
TL;DR本文介绍了Ready Policy One (RP1),将基于模型的强化学习视为一个主动学习问题,利用混合目标函数,在优化期间关键性的适应,以便在学习的不同阶段权衡奖励与探索,同时介绍了一个原则性的机制以停止样本收集。在多个连续控制任务中对方法进行了严格评估,并证明了与现有方法相比的显著增益。
Abstract
model-based reinforcement learning (MBRL) offers a promising direction for sample efficient learning, often achieving state of the art results for continuous control tasks. However, many existing MBRL methods rel