逆强化学习与梯度方法的学徒学习

Jun, 2012

逆强化学习与梯度方法的学徒学习

Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods

Gergely Neu, Csaba Szepesvari

TL;DR本文提出了一种新的梯度算法，用于从专家观察行为中学习策略，假设专家根据某种未知奖励函数行动最优，算法的目标是找到一个奖励函数使得最优策略与专家观察行为匹配良好，并且在两个人工数据集中表现更加可靠和高效。

Abstract

In this paper we propose a novel gradient algorithm to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a →