Data generation and labeling are usually an expensive part of learning for robotics. While active learning methods are commonly used to tackle the former problem, preference-based learning is a concept that attempts to solve the latter by querying users with preference questions. In this paper, we will develop a new algorithm, batch active preference-based learning, that enables efficient learning of reward functions using as few data samples as possible while still having short query generation times. We introduce several approximations to the batch active learning problem, and provide theoretical guarantees for the convergence of our algorithms. Finally, we present our experimental results for a variety of robotics tasks in simulation. Our results suggest that our batch active learning algorithm requires only a few queries that are computed in a short amount of time. We then showcase our algorithm in a study to learn human users' preferences.

本文介绍了一种新的算法，批量主动偏好学习，它使用尽可能少的数据样本进行有效的奖励函数学习，并具有较短的查询生成时间。我们为批量主动学习问题引入了几个近似，并为我们的算法的收敛提供了理论保证。通过在模拟中进行各种机器人任务的实验，我们的结果表明我们的批量主动学习算法仅需要少量计算时间短的查询。最后，我们将展示我们的算法在学习人类用户喜好的研究中的应用。

批量的基于偏好的奖励函数学习