We present a policy optimization framework in which the learned policy comes
with a machine-checkable certificate of adversarial robustness. Our approach,
called CAROL, learns a model of the environment. In each learning iteration, it
uses the current version of this model and an external abstract interpreter to
construct a differentiable signal for provable robustness. This signal is used
to guide policy learning, and the abstract interpretation used to construct it
directly leads to the robustness certificate returned at convergence. We give a
theoretical analysis that bounds the worst-case accumulative reward of CAROL.
We also experimentally evaluate CAROL on four MuJoCo environments. On these
tasks, which involve continuous state and action spaces, CAROL learns certified
policies that have performance comparable to the (non-certified) policies
learned using state-of-the-art robust RL methods.

本文介绍了一种基于证明高鲁棒性的策略优化框架，称为 CAROL，在学习环境模型的同时使用外部的抽象解释器来构建可微分信号来指导策略学习，并直接导致在收敛时返回的高鲁棒性证书。 在四个 MuJoCo 环境中的实验评估显示，CAROL 能够学习到与使用最先进的鲁棒 RL 方法学习到的非认证策略性能相当的认证策略。