BriefGPT.xyz
Oct, 2022
基于技能的强化学习与内在奖励匹配
Skill-Based Reinforcement Learning with Intrinsic Reward Matching
HTML
PDF
Ademi Adeniji, Amber Xie, Pieter Abbeel
TL;DR
本文提出了一种名为Intrinsic Reward Matching (IRM)的方法,通过skill discriminator将预训练和下游任务微调这两个阶段的学习结合起来,以更好地匹配内在和下游任务奖励,从而有效地利用预训练技能
Abstract
While
unsupervised skill discovery
has shown promise in autonomously acquiring behavioral primitives, there is still a large methodological disconnect between task-agnostic skill
pretraining
and downstream, task-
→