BriefGPT.xyz
Nov, 2021
通过随机回报分解学习长期奖励再分配
Learning Long-Term Reward Redistribution via Randomized Return Decomposition
HTML
PDF
Zhizhou Ren, Ruihan Guo, Yuan Zhou, Jian Peng
TL;DR
本文提出了一种基于RRD(Randomized Return Decomposition)算法的代理奖励机制,从而解决了强化学习中因奖励稀疏和延迟所引起的问题,并在基准任务上获得了显著的改进。
Abstract
Many practical applications of
reinforcement learning
require agents to learn from sparse and
delayed rewards
. It challenges the ability of agents to attribute their actions to future outcomes. In this paper, we
→