Deep Reinforcement Learning (DRL) has achieved remarkable advances in sequential decision tasks. However, recent works have revealed that DRL agents are susceptible to slight perturbations in observations. This vulnerability raises concerns regarding the effectiveness and robustness of deploying such agents in real-world applications. In this work, we propose a novel robust reinforcement learning method called SortRL, which improves the robustness of DRL policies against observation perturbations from the perspective of the network architecture. We employ a novel architecture for the policy network that incorporates global $l_\infty$ Lipschitz continuity and provide a convenient method to enhance policy robustness based on the output margin. Besides, a training framework is designed for SortRL, which solves given tasks while maintaining robustness against $l_\infty$ bounded perturbations on the observations. Several experiments are conducted to evaluate the effectiveness of our method, including classic control tasks and video games. The results demonstrate that SortRL achieves state-of-the-art robustness performance against different perturbation strength.

我们提出了一种名为SortRL的新型鲁棒性强化学习方法，通过网络架构的角度改善DRL策略对观测扰动的鲁棒性，并设计了一个训练框架，解决给定任务同时保持对观测的鲁棒性。多个实验表明SortRL在不同扰动强度下实现了最先进的鲁棒性能。

通过$l_∞$利普希茨策略网络提高强化学习对观测扰动的鲁棒性