We introduce a new framework of adversarial attacks, named calibration attacks, in which the attacks are generated and organized to trap victim models to be miscalibrated without altering their original accuracy, hence seriously endangering the trustworthiness of the models and any decision-making based on their confidence scores. Specifically, we identify four novel forms of calibration attacks: underconfidence attacks, overconfidence attacks, maximum miscalibration attacks, and random confidence attacks, in both the black-box and white-box setups. We then test these new attacks on typical victim models with comprehensive datasets, demonstrating that even with a relatively low number of queries, the attacks can create significant calibration mistakes. We further provide detailed analyses to understand different aspects of calibration attacks. Building on that, we investigate the effectiveness of widely used adversarial defences and calibration methods against these types of attacks, which then inspires us to devise two novel defences against such calibration attacks.

我们引入了一种名为校准攻击的对抗攻击框架，该框架通过生成和组织攻击来使受害模型误校准而不改变其原始准确性，从而严重危及模型的可信性和基于其置信度分数的决策。我们鉴别了四种新颖的校准攻击形式：低置信度攻击、高置信度攻击、最大误校准攻击和随机置信度攻击，并在黑盒和白盒设置中对典型受害模型使用全面的数据集进行了测试，证明即使只有相对较少的查询次数，这些攻击也能造成显著的校准错误。我们进一步详细分析了校准攻击的不同方面，并研究了广泛使用的对抗防御和校准方法对这些攻击的有效性，从而激发我们设计出两种新的防御措施来对抗此类校准攻击。

校准攻击：一种面向校准的对抗攻击框架