BriefGPT.xyz
Dec, 2016
无监督感知奖励用于模仿学习
Unsupervised Perceptual Rewards for Imitation Learning
HTML
PDF
Pierre Sermanet, Kelvin Xu, Sergey Levine
TL;DR
利用深度模型学习中间视觉表示的抽象能力来从少量的演示序列中快速推断知觉奖励函数,以便在真实世界环境中使用强化学习智能体执行任务。
Abstract
reward function design
and exploration time are arguably the biggest obstacles to the deployment of
reinforcement learning
(RL) agents in the real world. In many real-world tasks, designing a suitable reward func
→