BriefGPT.xyz
Oct, 2020
通过自监督奖励回归学习低效演示
Learning from Suboptimal Demonstration via Self-Supervised Reward Regression
HTML
PDF
Letian Chen, Rohan Paleja, Matthew Gombolay
TL;DR
本文提出了一种新的方法通过子优示范来合成优化参数化的数据来训练理想的奖励函数,从而克服了旧方法在使用子优示范时的一些限制,实现了更好的性能。
Abstract
learning from demonstration
(LfD) seeks to democratize
robotics
by enabling non-roboticist end-users to teach robots to perform a task by providing a human demonstration. However, modern LfD techniques, such as <
→