BriefGPT.xyz
Sep, 2019
具有概率上下文变量的元反强化学习
Meta-Inverse Reinforcement Learning with Probabilistic Context Variables
HTML
PDF
Lantao Yu, Tianhe Yu, Chelsea Finn, Stefano Ermon
TL;DR
研究表明通过使用深度潜在变量模型可以实现无监督学习来自不同但相关的任务演示数据的奖励函数,从而有效地解决逆强化学习中从少量演示推断奖励的问题,并在多个连续控制任务中展示了实验结果。
Abstract
Providing a suitable
reward function
to
reinforcement learning
can be difficult in many real world applications. While inverse
reinforcement lear
→