BriefGPT.xyz
Jun, 2019
整合人类演示和偏好的学习奖励函数
Learning Reward Functions by Integrating Human Demonstrations and Preferences
HTML
PDF
Malayandi Palan, Nicholas C. Landolfi, Gleb Shevchuk, Dorsa Sadigh
TL;DR
该研究提出了 DemPref 框架,结合演示和偏好查询来学习奖励函数,其对标准偏好学习方法具有更高的效率和更好的性能。
Abstract
Our goal is to accurately and efficiently learn reward functions for
autonomous robots
. Current approaches to this problem include
inverse reinforcement learning
(IRL), which uses expert
→