Bridging model-based safety and model-free reinforcement learning (RL) for dynamic robots is appealing since model-based methods are able to provide formal safety guarantees, while RL-based methods are able to exploit the robot agility by learning from the full-order system dynamics. However, current approaches to tackle this problem are mostly restricted to simple systems. In this paper, we propose a new method to combine model-based safety with model-free reinforcement learning by explicitly finding a low-dimensional model of the system controlled by a RL policy and applying stability and safety guarantees on that simple model. We use a complex bipedal robot Cassie, which is a high dimensional nonlinear system with hybrid dynamics and underactuation, and its RL-based walking controller as an example. We show that a low-dimensional dynamical model is sufficient to capture the dynamics of the closed-loop system. We demonstrate that this model is linear, asymptotically stable, and is decoupled across control input in all dimensions. We further exemplify that such linearity exists even when using different RL control policies. Such results point out an interesting direction to understand the relationship between RL and optimal control: whether RL tends to linearize the nonlinear system during training in some cases. Furthermore, we illustrate that the found linear model is able to provide guarantees by safety-critical optimal control framework, e.g., Model Predictive Control with Control Barrier Functions, on an example of autonomous navigation using Cassie while taking advantage of the agility provided by the RL-based controller.

本文提出了一种新方法，通过显式地找到受 RL 策略控制的系统的低维模型，并在简单模型上应用稳定性和安全保证，将基于模型的安全性与基于模型的无模型强化学习相结合。使用复杂的二足机器人 Cassie 和其基于强化学习的行走控制器作为示例，本文表明低维度的动力学模型足以捕捉闭环系统的动态，并说明所找到的线性模型能够通过安全关键的最优控制框架提供保证。

通过系统识别低维线性模型，构建基于模型的安全和基于模型无关的强化学习的桥梁