BriefGPT.xyz
Jun, 2012
逆强化学习与梯度方法的学徒学习
Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods
HTML
PDF
Gergely Neu, Csaba Szepesvari
TL;DR
本文提出了一种新的梯度算法,用于从专家观察行为中学习策略,假设专家根据某种未知奖励函数行动最优,算法的目标是找到一个奖励函数使得最优策略与专家观察行为匹配良好,并且在两个人工数据集中表现更加可靠和高效。
Abstract
In this paper we propose a novel
gradient algorithm
to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some unknown
reward function
of a
→