Alongside the well-publicized accomplishments of deep neural networks there has emerged an apparent bug in their success on tasks such as object recognition: with deep models trained using vanilla methods, input images can be slightly corrupted in order to modify output predictions, even when these corruptions are practically invisible. This apparent lack of robustness has led researchers to propose methods that can help to prevent an adversary from having such capabilities. The state-of-the-art approaches have incorporated the robustness requirement into the loss function, and the training process involves taking stochastic gradient descent steps not using original inputs but on adversarially-corrupted ones. In this paper we propose a multiclass boosting framework to ensure adversarial robustness. Boosting algorithms are generally well-suited for adversarial scenarios, as they were classically designed to satisfy a minimax guarantee. We provide a theoretical foundation for this methodology and describe conditions under which robustness can be achieved given a weak training oracle. We show empirically that adversarially-robust multiclass boosting not only outperforms the state-of-the-art methods, it does so at a fraction of the training time.

本文提出了一种多类别增强框架来确保对抗鲁棒性，通过将鲁棒性要求加入损失函数并使用被对抗性破坏的输入进行随机梯度下降步骤，证明在弱训练预测器的情况下可以实现鲁棒性，实验证明对抗鲁棒的多类别增强不仅优于最先进的方法，而且训练所需时间极少。

一个多分类提升框架，实现快速和可证明的对抗鲁棒性