Operating under real world conditions is challenging due to the possibility of a wide range of failures induced by partial observability. In relatively benign settings, such failures can be overcome by retrying or executing one of a small number of hand-engineered recovery strategies. By contrast, contact-rich sequential manipulation tasks, like opening doors and assembling furniture, are not amenable to exhaustive hand-engineering. To address this issue, we present a general approach for robustifying manipulation strategies in a sample-efficient manner. Our approach incrementally improves robustness by first discovering the failure modes of the current strategy via exploration in simulation and then learning additional recovery skills to handle these failures. To ensure efficient learning, we propose an online algorithm Value Upper Confidence Limit (Value-UCL) that selects what failure modes to prioritize and which state to recover to such that the expected performance improves maximally in every training episode. We use our approach to learn recovery skills for door-opening and evaluate them both in simulation and on a real robot with little fine-tuning. Compared to open-loop execution, our experiments show that even a limited amount of recovery learning improves task success substantially from 71\% to 92.4\% in simulation and from 75\% to 90\% on a real robot.

本文介绍了一种在采样效率高的情况下，通过先在模拟器中探索当前策略的失效模式，然后学习额外的恢复技能以处理这些失效来增加其鲁棒性的通用方法，提出了在线算法MetaReSkill用于监视所有恢复策略的进展，并将学习资源分配给最有可能改善任务表现的恢复；并以开门为例使用我们的方法来学习恢复技能，并对其进行了模拟和实体机器人测试，证明了我们的方法可以将任务成功率从71% 提高到92.4%（模拟中）和75% 提高到90%（实际测试中）

利用模型预测元推理进行高效的复原学习