BriefGPT.xyz
May, 2023
GRD:强化学习中可解释奖励再分配的生成式方法
GRD: A Generative Approach for Interpretable Reward Redistribution in Reinforcement Learning
HTML
PDF
Yudi Zhang, Yali Du, Biwei Huang, Ziyan Wang, Jun Wang...
TL;DR
本文提出了一种基于因果生成模型的回报分解方法,旨在解决强化学习中延迟奖励问题,并演示了该方法在实验中的良好性能及可解释性。
Abstract
A major challenge in
reinforcement learning
is to determine which state-action pairs are responsible for future rewards that are delayed.
return decomposition
offers a solution by redistributing rewards from obse
→