BriefGPT.xyz
Sep, 2024
适用于演员-评论家算法的兼容梯度近似
Compatible Gradient Approximations for Actor-Critic Algorithms
HTML
PDF
Baturay Saglam, Dionysis Kalogerias
TL;DR
本研究解决了确定性策略梯度算法在控制连续系统时因依赖评论家价值估计的导数而导致的不准确问题。通过在行动空间内采用基于两点随机梯度估计的零阶近似,我们提出了一种新的演员-评论家算法,有效地解决了确定性策略梯度方案中固有的兼容性问题。实证结果表明,该算法的性能不仅匹配,而且在许多情况下超越了当前的最先进方法。
Abstract
Deterministic Policy Gradient
algorithms are foundational for
Actor-Critic
methods in controlling continuous systems, yet they often encounter inaccuracies due to their dependence on the derivative of the critic'
→