BriefGPT.xyz
Aug, 2022
无需重要性采样的Actor-Critic方法的离线校正
Off-Policy Correction for Actor-Critic Algorithms in Deep Reinforcement Learning
HTML
PDF
Baturay Saglam, Dogan C. Cicek, Furkan B. Mutlu, Suleyman S. Kozat
TL;DR
本文研究了基于离线数据的深度强化学习算法,提出了一种新的策略相似度度量方法来提高算法的采样效率和泛化能力,并且证明了该方法可以实现安全的离线学习。实验证明,该方法相较于其他竞争算法在大多数情况下能够更高效地提高学习效率。
Abstract
Compared to on-policy
policy gradient
techniques, off-policy model-free deep
reinforcement learning
(RL) approaches that use previously gathered data can improve
→