BriefGPT.xyz
Oct, 2020
轨迹空间平滑的学习引导奖励
Learning Guidance Rewards with Trajectory-space Smoothing
HTML
PDF
Tanmay Gangwani, Yuan Zhou, Jian Peng
TL;DR
该论文介绍了一种使用轨迹空间平滑来学习指导奖励的算法,并阐明了该算法在解决强化学习中长期时序信用分配问题上的优越性。
Abstract
Long-term temporal
credit assignment
is an important challenge in
deep reinforcement learning
(RL). It refers to the ability of the agent to attribute actions to consequences that may occur after a long time inte
→