BriefGPT.xyz
Sep, 2009
一种收敛的在线单时间尺度演员评论家算法
A Convergent Online Single Time Scale Actor Critic Algorithm
HTML
PDF
D. Di Castro, R. Meir
TL;DR
介绍一种基于Actor-Critic的在线时序差异算法,用于评估值函数以及更新参数,且可以实现对平均奖励的局部最大值的收敛,为构建更真实的强化学习神经科学模型提供了可能性。
Abstract
actor-critic
based approaches were among the first to address
reinforcement learning
in a general setting. Recently, these algorithms have gained renewed interest due to their generality, good
→