As the pre-trained language models (PLMs) continue to grow, so do the
hardware and data requirements for fine-tuning PLMs. Therefore, the researchers
have come up with a lighter method called \textit{Prompt Learning}. However,
during the investigations, we observe that the prompt learning methods are
vulnerable and can easily be attacked by some illegally constructed prompts,
resulting in classification errors, and serious security problems for PLMs.
Most of the current research ignores the security issue of prompt-based
methods. Therefore, in this paper, we propose a malicious prompt template
construction method (\textbf{PromptAttack}) to probe the security performance
of PLMs. Several unfriendly template construction approaches are investigated
to guide the model to misclassify the task. Extensive experiments on three
datasets and three PLMs prove the effectiveness of our proposed approach
PromptAttack. We also conduct experiments to verify that our method is
applicable in few-shot scenarios.

本文提出一个恶意提示模板构造方法（PromptAttack）来探究预训练语言模型（PLMs）的安全性能。对三个数据集和三个 PLMs 进行广泛实验，证明了我们提出的 PromptAttack 方法的有效性。我们还进行了实验，验证了我们的方法在少量样本情境下也可以适用。

PromptAttack：一种基于提示的语言模型梯度搜索攻击方法

PromptAttack: Prompt-based Attack for Language Models via Gradient Search

Data-agnostic quasi-imperceptible perturbations on inputs are known to
degrade recognition accuracy of deep convolutional networks severely. This
phenomenon is considered to be a potential security issue. Moreover, some
results on statistical generalization guarantees indicate that the phenomenon
can be a key to improve the networks' generalization. However, the
characteristics of the shared directions of such harmful perturbations remain
unknown. Our primal finding is that convolutional networks are sensitive to the
directions of Fourier basis functions. We derived the property by specializing
a hypothesis of the cause of the sensitivity, known as the linearity of neural
networks, to convolutional networks and empirically validated it. As a
by-product of the analysis, we propose an algorithm to create shift-invariant
universal adversarial perturbations available in black-box settings.

通过研究深卷积网络的灵敏度，我们发现傅里叶基函数方向上的微小改变可以导致网络准确率下降，这是一种有潜在安全风险的现象，然而这种有害扰动的共享方向特征仍然未知。鉴于对其性质的探究，我们针对黑盒模型提出了一个通用算法，可以生成具有位移不变性的通用对抗扰动。