BriefGPT.xyz
May, 2012
离策演员-评论家
Off-Policy Actor-Critic
HTML
PDF
Thomas Degris, Martha White, Richard S. Sutton
TL;DR
本研究提出了一种在线的增量式actor-critic算法来应对现实生活中的多种问题,在采用off-policy学习和最新的gradient temporal-difference技术的同时,能够灵活地运用policy设计,具有较强的学习潜力和泛化性能,并能收敛至较好的算法性能。
Abstract
This paper presents the first
actor-critic algorithm
for
off-policy reinforcement learning
. Our algorithm is online and incremental, and its per-time-step complexity scales linearly with the number of learned wei
→