This paper presents a study of robust policy networks in deep reinforcement learning. We investigate the benefits of policy parameterizations that naturally satisfy constraints on their Lipschitz bound, analyzing their empirical performance and robustness on two representative problems: pendulum swing-up and Atari Pong. We illustrate that policy networks with small Lipschitz bounds are significantly more robust to disturbances, random noise, and targeted adversarial attacks than unconstrained policies composed of vanilla multi-layer perceptrons or convolutional neural networks. Moreover, we find that choosing a policy parameterization with a non-conservative Lipschitz bound and an expressive, nonlinear layer architecture gives the user much finer control over the performance-robustness trade-off than existing state-of-the-art methods based on spectral normalization.

该研究利用深度强化学习探讨了鲁棒策略网络的优势，通过分析其在振荡摆和Atari Pong等问题上的实证性能和稳健性，证明了具有小Lipschitz界限的策略网络相比由普通多层感知机或卷积神经网络组成的无约束策略在扰动、随机噪声和有针对性的对抗攻击方面更加稳健。此外，研究还发现选择一个具有非保守的Lipschitz界限和具有表达力的非线性层结构的策略参数化方法可以更好地平衡性能和稳健性的权衡，优于现有基于谱标准化的最新方法。

基于利普希茨有界策略网络的强化学习的鲁棒性