BriefGPT.xyz
Oct, 2022
交互自主学习偏好
Learning Preferences for Interactive Autonomy
HTML
PDF
Erdem Bıyık
TL;DR
研究人机交互中智能机器人的学习奖励功能从而完成任务,探讨通过对多种机器人轨迹的比较反馈方式学习机器的奖励功能,包括两两比较、评分、最佳选择等,并提出主动学习技术,以优化从用户反馈中获得的期望信息,进而在自主驾驶模拟、家庭机器人、标准强化学习等领域展示了这种方法的适用性。
Abstract
When robots enter everyday human environments, they need to understand their tasks and how they should perform those tasks. To encode these,
reward functions
, which specify the objective of a robot, are employed. However, designing
→