Reinforcement learning (RL) has achieved tremendous progress in solving various sequential decision-making problems, e.g., control tasks in robotics. However, RL methods often fail to generalize to safety-critical scenarios since policies are overfitted to training environments. Previously, robust adversarial reinforcement learning (RARL) was proposed to train an adversarial network that applies disturbances to a system, which improves robustness in test scenarios. A drawback of neural-network-based adversaries is that integrating system requirements without handcrafting sophisticated reward signals is difficult. Safety falsification methods allow one to find a set of initial conditions as well as an input sequence, such that the system violates a given property formulated in temporal logic. In this paper, we propose falsification-based RARL (FRARL), the first generic framework for integrating temporal-logic falsification in adversarial learning to improve policy robustness. With falsification method, we do not need to construct an extra reward function for the adversary. We evaluate our approach on a braking assistance system and an adaptive cruise control system of autonomous vehicles. Experiments show that policies trained with a falsification-based adversary generalize better and show less violation of the safety specification in test scenarios than the ones trained without an adversary or with an adversarial network.

本文提出了基于 Falsification 的 Robust Adversarial Reinforcement Learning (FRARL) 框架，将时序逻辑 Falsification 整合到 Adversarial Learning 中以提高策略的鲁棒性，实验结果表明，采用该框架训练的智能体比其他方法更具有通用性和遵守安全规则的能力。

基于反证法的强化学习鲁棒性敌对攻击