The performance of a model trained with \textit{noisy labels} is often
improved by simply \textit{retraining} the model with its own predicted
\textit{hard} labels (i.e., $1$/$0$ labels). Yet, a detailed theoretical
characterization of this phenomenon is lacking. In this paper, we theoretically
analyze retraining in a linearly separable setting with randomly corrupted
labels given to us and prove that retraining can improve the population
accuracy obtained by initially training with the given (noisy) labels. To the
best of our knowledge, this is the first such theoretical result. Retraining
finds application in improving training with label differential privacy (DP)
which involves training with noisy labels. We empirically show that retraining
selectively on the samples for which the predicted label matches the given
label significantly improves label DP training at \textit{no extra privacy
cost}; we call this \textit{consensus-based retraining}. For e.g., when
training ResNet-18 on CIFAR-100 with $\epsilon=3$ label DP, we obtain $6.4\%$
improvement in accuracy with consensus-based retraining.

通过理论分析，在给定随机受损标签的线性可分情境中，重新训练可以提高模型的整体准确率，并且通过基于共识的重新训练方法在没有额外的隐私成本的情况下显著提高标签差分隐私训练的准确率。

用预测的难样本标签重新训练可证明提高模型准确性

Retraining with Predicted Hard Labels Provably Increases Model Accuracy

Adversarial attacks reveal serious flaws in deep learning models. More
dangerously, these attacks preserve the original meaning and escape human
recognition. Existing methods for detecting these attacks need to be trained
using original/adversarial data. In this paper, we propose detection without
training by voting on hard labels from predictions of transformations, namely,
VoteTRANS. Specifically, VoteTRANS detects adversarial text by comparing the
hard labels of input text and its transformation. The evaluation demonstrates
that VoteTRANS effectively detects adversarial text across various
state-of-the-art attacks, models, and datasets.

本文提出了一种名为 VoteTRANS 的检测方法，通过比较输入文本和其转换的硬标签来检测对抗性文本，无需基于原始数据或对抗数据进行训练，并且在各种最新的攻击、模型和数据集上表现良好。

VoteTRANS: 通过在转换的困难标签上投票检测敌对文本，无需训练

VoteTRANS: Detecting Adversarial Text without Training by Voting on Hard  Labels of Transformations

In ordinary distillation, student networks are trained with soft labels (SLs)
given by pretrained teacher networks, and students are expected to improve upon
teachers since SLs are stronger supervision than the original hard labels.
However, when considering adversarial robustness, teachers may become
unreliable and adversarial distillation may not work: teachers are pretrained
on their own adversarial data, and it is too demanding to require that teachers
are also good at every adversarial data queried by students. Therefore, in this
paper, we propose reliable introspective adversarial distillation (IAD) where
students partially instead of fully trust their teachers. Specifically, IAD
distinguishes between three cases given a query of a natural data (ND) and the
corresponding adversarial data (AD): (a) if a teacher is good at AD, its SL is
fully trusted; (b) if a teacher is good at ND but not AD, its SL is partially
trusted and the student also takes its own SL into account; (c) otherwise, the
student only relies on its own SL. Experiments demonstrate the effectiveness of
IAD for improving upon teachers in terms of adversarial robustness.

提出一种新的神经网络训练方法，叫做可靠的内省式敌对蒸馏（IAD），用于提高神经网络对抗攻击的能力。通过在不同情况下，对不同来源的标签进行部分可信任处理，以提高神经网络的稳定性。实验结果表明，IAD 在提高抗对抗性方面的效果显著。