BriefGPT.xyz
Feb, 2021
适应性时间顺序学分分配的配对权重
Pairwise Weights for Temporal Credit Assignment
HTML
PDF
Zeyu Zheng, Risto Vuorio, Richard Lewis, Satinder Singh
TL;DR
本文旨在解决强化学习中最基本的关于时间信用分配问题,通过使用基于状态时的换算系数或基于更一般的由状态,所需格外步骤和奖励时间之间的函数关系的静态/动态配重方法,在学习RL策略的过程中使用元梯度方法学习这些分配函数从而提高性能。
Abstract
How much credit (or blame) should an action taken in a state get for a future reward? This is the fundamental
temporal credit assignment problem
in
reinforcement learning
(RL). One of the earliest and still most
→