Scarce data is a major challenge to scaling robot learning to truly complex tasks, as we need to generalize locally learned policies over different "contexts". Bayesian optimization approaches to contextual policy search (CPS) offer data-efficient policy learning that generalize over a context space. We propose to improve data- efficiency by factoring typically considered contexts into two components: target- type contexts that correspond to a desired outcome of the learned behavior, e.g. target position for throwing a ball; and environment type contexts that correspond to some state of the environment, e.g. initial ball position or wind speed. Our key observation is that experience can be directly generalized over target-type contexts. Based on that we introduce Factored Contextual Policy Search with Bayesian Optimization for both passive and active learning settings. Preliminary results show faster policy generalization on a simulated toy problem.

提出基于贝叶斯优化的因式化上下文策略搜索方法来提高机器人学习数据效率，通过将通常考虑的文本刻画为目标类型上下文和环境类型上下文两个部分，从而实现经验在目标类型上下文中直接泛化。初步结果表明，该方法在模拟玩具问题上可以更快地泛化策略。

基于贝叶斯优化的因素化情境策略搜索