BriefGPT.xyz
Nov, 2020
模型无关强化学习中的反事实信用分配
Counterfactual Credit Assignment in Model-Free Reinforcement Learning
HTML
PDF
Thomas Mesnard, Théophane Weber, Fabio Viola, Shantanu Thakoor, Alaa Saade...
TL;DR
本研究应用反事实的思想来解决强化学习领域中action对于未来奖励的影响以及技能和运气的区分问题,并提出了一种使用未来条件价值函数作为基准的策略梯度算法,以及加入了不确定因素的验证和实验,证明了该算法有效性和低方差的特点。
Abstract
credit assignment
in
reinforcement learning
is the problem of measuring an action influence on future rewards. In particular, this requires separating
→