BriefGPT.xyz
Feb, 2020
用逆强化学习改写历史:后见推断对政策改进的影响
Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement
HTML
PDF
Benjamin Eysenbach, Xinyang Geng, Sergey Levine, Ruslan Salakhutdinov
TL;DR
本文介绍了逆强化学习(inverse RL),采用逆强化学习方法来实现目标重标记技术(goal-relabeling techniques),并证实在多任务设置下,包括目标达成、具有离散奖励集合和线性奖励函数的领域中,使用逆强化学习加速了学习过程。
Abstract
multi-task reinforcement learning
(RL) aims to simultaneously learn policies for solving many tasks. Several prior works have found that relabeling past experience with different reward functions can improve
sample effi
→