Existing textual adversarial attacks usually utilize the gradient or
prediction confidence to generate adversarial examples, making it hard to be
deployed in real-world applications. To this end, we consider a rarely
investigated but more rigorous setting, namely hard-label attack, in which the
attacker can only access the prediction label. In particular, we find we can
learn the importance of different words via the change on prediction label
caused by word substitutions on the adversarial examples. Based on this
observation, we propose a novel adversarial attack, termed Text Hard-label
attacker (TextHacker). TextHacker randomly perturbs lots of words to craft an
adversarial example. Then, TextHacker adopts a hybrid local search algorithm
with the estimation of word importance from the attack history to minimize the
adversarial perturbation. Extensive evaluations for text classification and
textual entailment show that TextHacker significantly outperforms existing
hard-label attacks regarding the attack performance as well as adversary
quality.

该研究提出了一种基于单独预测标签的敌对攻击 TextHacker，通过学习词汇替换对文本输出标签的影响来确定关键词汇，采用混合本地搜索和攻击历史估计词汇重要性来最小化被攻击文本所需的修改，该攻击在文本分类和文本蕴含方面具有显著的优越性。

TextHacker: 基于学习的混合局部搜索算法用于文本硬标签对抗攻击

TextHacker: Learning based Hybrid Local Search Algorithm for Text Hard-label Adversarial Attack

Previous studies have verified that the functionality of black-box models can
be stolen with full probability outputs. However, under the more practical
hard-label setting, we observe that existing methods suffer from catastrophic
performance degradation. We argue this is due to the lack of rich information
in the probability prediction and the overfitting caused by hard labels. To
this end, we propose a novel hard-label model stealing method termed
\emph{black-box dissector}, which consists of two erasing-based modules. One is
a CAM-driven erasing strategy that is designed to increase the information
capacity hidden in hard labels from the victim model. The other is a
random-erasing-based self-knowledge distillation module that utilizes soft
labels from the substitute model to mitigate overfitting. Extensive experiments
on four widely-used datasets consistently demonstrate that our method
outperforms state-of-the-art methods, with an improvement of at most $8.27\%$.
We also validate the effectiveness and practical potential of our method on
real-world APIs and defense methods. Furthermore, our method promotes other
downstream tasks, \emph{i.e.}, transfer adversarial attacks.

该研究提出了一种新的黑盒模型窃取方法，主要包括基于 CAM 的擦除策略和基于随机擦除的自知识蒸馏模块，通过从受害者模型中获取潜在的信息容量并使用替代模型的软标签缓解过度拟合问题，最终能够提高模型窃取效果达到最多 8.27％的提升，并有望在实际 API 和防御机制中使用。