Reinforcement learning (RL) often necessitates a meticulous Markov Decision Process (MDP) design tailored to each task. This work aims to address this challenge by proposing a systematic approach to behavior synthesis and control for multi-contact loco-manipulation tasks, such as navigating spring-loaded doors and manipulating heavy dishwashers. We define a task-independent MDP to train RL policies using only a single demonstration per task generated from a model-based trajectory optimizer. Our approach incorporates an adaptive phase dynamics formulation to robustly track the demonstrations while accommodating dynamic uncertainties and external disturbances. We compare our method against prior motion imitation RL works and show that the learned policies achieve higher success rates across all considered tasks. These policies learn recovery maneuvers that are not present in the demonstration, such as re-grasping objects during execution or dealing with slippages. Finally, we successfully transfer the policies to a real robot, demonstrating the practical viability of our approach.

本研究解决了强化学习中每个任务需精心设计马尔可夫决策过程的难题，提出了一种系统的方法进行多接触运动操控任务的行为合成与控制。通过定义任务无关的马尔可夫决策过程，我们的策略能够在动态不确定性和外部干扰下，学习到更高成功率的操控策略，并在真实机器人上成功转化，展示了其实用性。

引导强化学习用于鲁棒的多接触运动操控