BriefGPT.xyz
Oct, 2019
乐观演员-评论家算法实现更好的探索
Better Exploration with Optimistic Actor-Critic
HTML
PDF
Kamil Ciosek, Quan Vuong, Robert Loftin, Katja Hofmann
TL;DR
本论文提出了一种新的强化学习算法——乐观的Actor-Critic方法(OAC),通过在状态动作值函数上近似上限和下限的置信区间,实现了在探索性上的乐观及方向性采样,从而提高了算法对连续控制任务的采样效率。
Abstract
actor-critic methods
, a type of model-free
reinforcement learning
, have been successfully applied to challenging tasks in continuous control, often achieving state-of-the art performance. However, wide-scale adop
→