Reinforcement learning (RL) in real-world safety-critical target settings
like urban driving is hazardous, imperiling the RL agent, other agents, and the
environment. To overcome this difficulty, we propose a "safety-critical
adaptation" task setting: an agent first trains in non-safety-critical "source"
environments such as in a simulator, before it adapts to the target environment
where failures carry heavy costs. We propose a solution approach, CARL, that
builds on the intuition that prior experience in diverse environments equips an
agent to estimate risk, which in turn enables relative safety through
risk-averse, cautious adaptation. CARL first employs model-based RL to train a
probabilistic model to capture uncertainty about transition dynamics and
catastrophic states across varied source environments. Then, when exploring a
new safety-critical environment with unknown dynamics, the CARL agent plans to
avoid actions that could lead to catastrophic states. In experiments on car
driving, cartpole balancing, half-cheetah locomotion, and robotic object
manipulation, CARL successfully acquires cautious exploration behaviors,
yielding higher rewards with fewer failures than strong RL adaptation
baselines. Website at this https URL

提出一种安全关键的适应性强化学习任务设置和解决方案 CARL，通过以多样化环境中的先前经验来评估风险，实现对新领域的谨慎探索并避免灾难状态，为城市驾驶等安全关键环境下的强化学习提供可行性。

安全关键环境下的强化学习谨慎自适应

Cautious Adaptation For Reinforcement Learning in Safety-Critical  Settings

We explore adversarial robustness in the setting in which it is acceptable
for a classifier to abstain---that is, output no class---on adversarial
examples. Adversarial examples are small perturbations of normal inputs to a
classifier that cause the classifier to give incorrect output; they present
security and safety challenges for machine learning systems. In many
safety-critical applications, it is less costly for a classifier to abstain on
adversarial examples than to give incorrect output for them. We first introduce
a novel objective function for adversarial robustness with an abstain option
which characterizes an explicit tradeoff between robustness and accuracy. We
then present a simple baseline in which an adversarially-trained classifier
abstains on all inputs within a certain distance of the decision boundary,
which we theoretically and experimentally evaluate. Finally, we propose
Combined Abstention Robustness Learning (CARL), a method for jointly learning a
classifier and the region of the input space on which it should abstain. We
explore different variations of the PGD and DeepFool adversarial attacks on
CARL in the abstain setting. Evaluating against these attacks, we demonstrate
that training with CARL results in a more accurate, robust, and efficient
classifier than the baseline.

本文探讨了一种适用于分类器在对抗样本上放弃输出任何类别 (即通过放弃输出任何类别来实现对抗鲁棒性) 的情况下的对抗鲁棒性问题，提出了一种新的带有放弃选项的对抗鲁棒性目标函数，并提出了一个基于该目标函数的基准，最后提出了一种 Combined Abstention Robustness Learning (CARL) 方法来实现分类器及其应该放弃输出的输入空间的区域的联合学习。通过对 PGD 和 DeepFool 等攻击的评估，得出使用 CARL 训练的分类器比基准分类器更精确、更鲁棒、更有效。