We propose and demonstrate a compositional framework for training and verifying reinforcement learning (RL) systems within a multifidelity sim-to-real pipeline, in order to deploy reliable and adaptable RL policies on physical hardware. By decomposing complex robotic tasks into component subtasks and defining mathematical interfaces between them, the framework allows for the independent training and testing of the corresponding subtask policies, while simultaneously providing guarantees on the overall behavior that results from their composition. By verifying the performance of these subtask policies using a multifidelity simulation pipeline, the framework not only allows for efficient RL training, but also for a refinement of the subtasks and their interfaces in response to challenges arising from discrepancies between simulation and reality. In an experimental case study we apply the framework to train and deploy a compositional RL system that successfully pilots a Warthog unmanned ground robot.

我们提出并展示了一种用于训练和验证强化学习系统的组合框架，在多功能度的仿真到实际应用中，以便在物理硬件上部署可靠和适应性强的强化学习策略。通过将复杂的机器人任务分解为组成子任务，并定义它们之间的数学接口，该框架允许对相应的子任务策略进行独立训练和测试，同时可以对其组合所产生的整体行为提供保证。通过使用多功能度仿真管道验证这些子任务策略的性能，该框架不仅可以实现高效的强化学习训练，还可以根据仿真和实际之间的差异挑战对子任务及其接口进行改进。在实验案例研究中，我们将该框架应用于训练和部署一个成功驾驶Warthog无人地面机器人的组合式强化学习系统。

可验证和组合式的多信度模拟到现实强化学习流水线