In this paper, we propose a novel technique, Back-stepping Experience Replay (BER), that is compatible with arbitrary off-policy reinforcement learning (RL) algorithms. BER aims to enhance learning efficiency in systems with approximate reversibility, reducing the need for complex reward shaping. The method constructs reversed trajectories using back-stepping transitions to reach random or fixed targets. Interpretable as a bi-directional approach, BER addresses inaccuracies in back-stepping transitions through a distillation of the replay experience during learning. Given the intricate nature of soft robots and their complex interactions with environments, we present an application of BER in a model-free RL approach for the locomotion and navigation of a soft snake robot, which is capable of serpentine motion enabled by anisotropic friction between the body and ground. In addition, a dynamic simulator is developed to assess the effectiveness and efficiency of the BER algorithm, in which the robot demonstrates successful learning (reaching a 100% success rate) and adeptly reaches random targets, achieving an average speed 48% faster than that of the best baseline approach.

我们提出了一种新颖的技术Back-stepping Experience Replay (BER)，它与任意的离线策略强化学习算法兼容。BER旨在增强具有近似可逆性的系统的学习效率，减少对复杂奖励塑造的需求。该方法通过后退传递来构建反向轨迹以达到随机或固定的目标，并通过在学习过程中重复经验的提炼来解决后退传递中的不准确性问题。我们将BER应用于无模型的强化学习方法，用于软蛇机器人的运动和导航，软蛇机器人能够通过身体与地面之间的非均质摩擦而实现曲线运动。此外，我们还开发了一个动态模拟器来评估BER算法的有效性和效率，其中机器人成功学习（达到100%的成功率），并能够迅速到达随机目标，速度比最佳基线方法快48%。

使用反向经验回放方法对软性蛇形机器人的无模型强化学习