We draw on the latest advancements in the physics community to propose a novel method for discovering the governing non-linear dynamics of physical systems in reinforcement learning (RL). We establish that this method is capable of discovering the underlying dynamics using significantly fewer trajectories (as little as one rollout with $\leq 30$ time steps) than state of the art model learning algorithms. Further, the technique learns a model that is accurate enough to induce near-optimal policies given significantly fewer trajectories than those required by model-free algorithms. It brings the benefits of model-based RL without requiring a model to be developed in advance, for systems that have physics-based dynamics. To establish the validity and applicability of this algorithm, we conduct experiments on four classic control tasks. We found that an optimal policy trained on the discovered dynamics of the underlying system can generalize well. Further, the learned policy performs well when deployed on the actual physical system, thus bridging the model to real system gap. We further compare our method to state-of-the-art model-based and model-free approaches, and show that our method requires fewer trajectories sampled on the true physical system compared other methods. Additionally, we explored approximate dynamics models and found that they also can perform well.

利用物理学领域的最新进展，提出一种新的方法来发现强化学习中物理系统的控制非线性动态，并证明此方法能够在很少的轨迹采样数量（仅需要一次$≤30$时间步的轨迹）下发现此动态，从而为系统带来基于模型的强化学习的好处，并且不需要事先开发模型。该算法在四个控制问题上的实验表明，训练得到的基于控制系统真实动态的最优策略泛化能力强，且对于实际物理系统具有很好的性能表现。与现有的其他方法相比，该方法需要采样更少的真实物理系统轨迹。

基于模型的SINDy强化学习