Scarce data is a major challenge to scaling robot learning to truly complex
tasks, as we need to generalize locally learned policies over different
"contexts". Bayesian optimization approaches to contextual policy search (CPS)
offer data-efficient policy learning that generalize over a context space. We
propose to improve data-efficiency by factoring typically considered contexts
into two components: target-type contexts that correspond to a desired outcome
of the learned behavior, e.g. target position for throwing a ball; and
environment type contexts that correspond to some state of the environment,
e.g. initial ball position or wind speed. Our key observation is that
experience can be directly generalized over target-type contexts. Based on that
we introduce Factored Contextual Policy Search with Bayesian Optimization for
both passive and active learning settings. Preliminary results show faster
policy generalization on a simulated toy problem. A full paper extension is
available at arXiv:1904.11761

提出基于贝叶斯优化的因式化上下文策略搜索方法来提高机器人学习数据效率，通过将通常考虑的文本刻画为目标类型上下文和环境类型上下文两个部分，从而实现经验在目标类型上下文中直接泛化。初步结果表明，该方法在模拟玩具问题上可以更快地泛化策略。

基于贝叶斯优化的因素化情境策略搜索

Factored Contextual Policy Search with Bayesian Optimization

Object handover is a basic, but essential capability for robots interacting
with humans in many applications, e.g., caring for the elderly and assisting
workers in manufacturing workshops. It appears deceptively simple, as humans
perform object handover almost flawlessly. The success of humans, however,
belies the complexity of object handover as collaborative physical interaction
between two agents with limited communication. This paper presents a learning
algorithm for dynamic object handover, for example, when a robot hands over
water bottles to marathon runners passing by the water station. We formulate
the problem as contextual policy search, in which the robot learns object
handover by interacting with the human. A key challenge here is to learn the
latent reward of the handover task under noisy human feedback. Preliminary
experiments show that the robot learns to hand over a water bottle naturally
and that it adapts to the dynamics of human motion. One challenge for the
future is to combine the model-free learning algorithm with a model-based
planning approach and enable the robot to adapt over human preferences and
object characteristics, such as shape, weight, and surface texture.

本文介绍了一种基于学习算法的动态物体移交方法，通过与人类交互学习物体移交的潜在报酬，使机器人能够自然地适应人体运动的动态，并通过上下文策略搜索来建立问题模型。

从人类反馈学习机器人到人类的动态物体递交

Learning Dynamic Robot-to-Human Object Handover from Human Feedback

Contextual policy search allows adapting robotic movement primitives to
different situations. For instance, a locomotion primitive might be adapted to
different terrain inclinations or desired walking speeds. Such an adaptation is
often achievable by modifying a small number of hyperparameters. However,
learning, when performed on real robotic systems, is typically restricted to a
small number of trials. Bayesian optimization has recently been proposed as a
sample-efficient means for contextual policy search that is well suited under
these conditions. In this work, we extend entropy search, a variant of Bayesian
optimization, such that it can be used for active contextual policy search
where the agent selects those tasks during training in which it expects to
learn the most. Empirical results in simulation suggest that this allows
learning successful behavior with less trials.

这篇论文研究了使用贝叶斯优化中的熵搜索来进行主动上下文策略寻优，以便在少量试验中学习成功的行为。