BriefGPT.xyz
Oct, 2024
缓解复杂Q函数中确定性策略梯度的次优性
Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions
HTML
PDF
Ayush Jain, Norio Kosaka, Xinhu Li, Kyung-Min Kim, Erdem Bıyık...
TL;DR
本研究针对强化学习中确定性策略梯度方法(如DDPG和TD3)在复杂任务中的局部最优问题,提出了一种新型演员架构。通过使用多个演员和更易于优化的Q函数替代品,该架构能够更频繁地找到最优动作,并在多项任务中表现优于其他演员架构。
Abstract
In
Reinforcement Learning
, off-policy
Actor-Critic
approaches like DDPG and TD3 are based on the deterministic
Policy Gradient
. Herein, th
→