BriefGPT.xyz
May, 2020
基于主动偏好的高斯过程回归用于奖励学习
Active Preference-Based Gaussian Process Regression for Reward Learning
HTML
PDF
Erdem Bıyık, Nicolas Huynh, Mykel J. Kochenderfer, Dorsa Sadigh
TL;DR
本文介绍了一种基于用户反馈的偏好学习方法,利用高斯过程(GP)对奖励函数进行建模,在不增加结构限制并避免数据不足和刚性的问题的情况下,仅通过比较轨迹即可有效学习机器人任务的表达性奖励函数。
Abstract
Designing reward functions is a challenging problem in
ai
and
robotics
. Humans usually have a difficult time directly specifying all the desirable behaviors that a robot needs to optimize. One common approach is
→