DDPG 算法中的问题：理解稀疏奖励下确定性环境的失败

Nov, 2019

DDPG 算法中的问题：理解稀疏奖励下确定性环境的失败

The problem with DDPG: understanding failures in deterministic environments with sparse rewards

Guillaume Matheron, Nicolas Perrin, Olivier Sigaud

TL;DR本文阐述了稀疏奖励和确定性环境下，状态-行为连续空间下的强化学习算法会因收敛问题而失败的原因，并提出了解决这些问题的潜在方法。

Abstract

In environments with continuous state and action spaces, state-of-the-art actor-critic reinforcement learning algorithms can solve very complex problems, yet can also fail in environments that seem trivial, but t