Deep reinforcement learning agents achieve state-of-the-art performance in a wide range of simulated control tasks. However, successful applications to real-world problems remain limited. One reason for this dichotomy is because the learned policies are not robust to observation noise or adversarial attacks. In this paper, we investigate the robustness of deep RL policies to a single small state perturbation in deterministic continuous control tasks. We demonstrate that RL policies can be deterministically chaotic as small perturbations to the system state have a large impact on subsequent state and reward trajectories. This unstable non-linear behaviour has two consequences: First, inaccuracies in sensor readings, or adversarial attacks, can cause significant performance degradation; Second, even policies that show robust performance in terms of rewards may have unpredictable behaviour in practice. These two facets of chaos in RL policies drastically restrict the application of deep RL to real-world problems. To address this issue, we propose an improvement on the successful Dreamer V3 architecture, implementing a Maximal Lyapunov Exponent regularisation. This new approach reduces the chaotic state dynamics, rendering the learnt policies more resilient to sensor noise or adversarial attacks and thereby improving the suitability of Deep Reinforcement Learning for real-world applications.

本研究针对深度强化学习政策在真实世界应用中缺乏鲁棒性的问题，探讨了小状态扰动对其稳定性的影响。提出了一种改进的Dreamer V3架构，通过最大李雅普诺夫指数正则化来减少状态动态的混沌性，从而提高了学习政策对传感器噪声和对抗攻击的抵抗能力。这一方法大大增强了深度强化学习在实际应用中的适用性。

通过李雅普诺夫指数增强深度强化学习的鲁棒性