BriefGPT.xyz
Nov, 2019
DDPG 算法中的问题:理解稀疏奖励下确定性环境的失败
The problem with DDPG: understanding failures in deterministic environments with sparse rewards
HTML
PDF
Guillaume Matheron, Nicolas Perrin, Olivier Sigaud
TL;DR
本文阐述了稀疏奖励和确定性环境下,状态-行为连续空间下的强化学习算法会因收敛问题而失败的原因,并提出了解决这些问题的潜在方法。
Abstract
In environments with continuous state and action spaces, state-of-the-art
actor-critic
reinforcement learning
algorithms can solve very complex problems, yet can also fail in environments that seem trivial, but t
→