We introduce a novel deep reinforcement learning (RL) approach called Movement Prmitive-based Planning Policy (MP3). By integrating movement primitives (MPs) into the deep RL framework, MP3 enables the generation of smooth trajectories throughout the whole learning process while effectively learning from sparse and non-Markovian rewards. Additionally, MP3 maintains the capability to adapt to changes in the environment during execution. Although many early successes in robot RL have been achieved by combining RL with MPs, these approaches are often limited to learning single stroke-based motions, lacking the ability to adapt to task variations or adjust motions during execution. Building upon our previous work, which introduced an episode-based RL method for the non-linear adaptation of MP parameters to different task variations, this paper extends the approach to incorporating replanning strategies. This allows adaptation of the MP parameters throughout motion execution, addressing the lack of online motion adaptation in stochastic domains requiring feedback. We compared our approach against state-of-the-art deep RL and RL with MPs methods. The results demonstrated improved performance in sophisticated, sparse reward settings and in domains requiring replanning.

本文介绍了一种名为MP3的深度强化学习方法，它通过将运动原语（MPs）整合到深度RL框架中，实现了在整个学习过程中生成平滑轨迹的能力，同时有效地从稀疏且非马尔可夫奖励中学习，还具有在执行过程中适应环境变化的能力，该方法相较于现有的深度RL和RL结合MPs等方法，在复杂、稀疏奖励环境和需要重规划的领域中表现出更好的性能。

基于运动基元的（再）规划策略