BriefGPT.xyz
Apr, 2016
同时估计奖励与动态的逆强化学习
Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics
HTML
PDF
Michael Herman, Tobias Gindele, Jörg Wagner, Felix Schmitt, Wolfram Burgard
TL;DR
本文提出了一种基于梯度的逆强化学习方法,同时估计系统动态,以后解决由生成策略引起的演示偏差,有效提高了样本利用率并准确估计奖励和转移模型,该方法在合成MDP和转移学习任务上都得到了改进。
Abstract
inverse reinforcement learning
(IRL) describes the problem of learning an unknown reward function of a
markov decision process
(MDP) from observed behavior of an agent. Since the agent's behavior originates in it
→