We consider adversarial training of deep neural networks through the lens of Bayesian learning, and present a principled framework for adversarial training of Bayesian Neural Networks (BNNs) with certifiable guarantees. We rely on techniques from constraint relaxation of non-convex optimisation problems and modify the standard cross-entropy error model to enforce posterior robustness to worst-case perturbations in $\epsilon$-balls around input points. We illustrate how the resulting framework can be combined with methods commonly employed for approximate inference of BNNs. In an empirical investigation, we demonstrate that the presented approach enables training of certifiably robust models on MNIST, FashionMNIST and CIFAR-10 and can also be beneficial for uncertainty calibration. Our method is the first to directly train certifiable BNNs, thus facilitating their deployment in safety-critical applications.

通过贝叶斯学习的视角考虑深度神经网络的对抗训练，并提出了一种具有可证明保证的贝叶斯神经网络（BNN）的对抗训练的原则性框架。该方法可在MNIST、FashionMNIST和CIFAR-10上训练出可证明鲁棒性的模型，并用于不确定性校准。这是第一次直接训练可证明的BNN，可促进在安全关键应用中的部署。

基于可验证对抗性鲁棒性的贝叶斯推断