BriefGPT.xyz
Sep, 2019
最大熵深度强化学习的软策略梯度方法
Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning
HTML
PDF
Wenjie Shi, Shiji Song, Cheng Wu
TL;DR
本文提出了一种新的深度强化学习算法,利用基于熵正则化的期望回报目标推导出软策略梯度,将其与软Bellman方程相结合,得到了名为DSPG的最大熵深度强化学习算法,该算法采用双重采样方法确保学习的稳定性,有效提高了表现,克服了已有方法在大规模离线数据训练以及具有高维动作状态问题的稳定性不足等问题。
Abstract
maximum entropy
deep reinforcement learning
(RL) methods have been demonstrated on a range of challenging continuous tasks. However, existing methods either suffer from severe instability when training on large <
→