基于梯度学习器的逆强化学习

Jul, 2020

Inverse Reinforcement Learning from a Gradient-based Learner

Giorgia Ramponi, Gianluca Drappo, Marcello Restelli

TL;DR本文提出了用于从机器人的多次策略中恢复策略目标的新算法。该算法基于观察所观察到的代理程序沿梯度方向更新其策略参数的假设。

Abstract

inverse reinforcement learning addresses the problem of inferring an expert's reward function from demonstrations. However, in many applications, we not only have access to the expert's near-optimal behavior, but