BriefGPT.xyz
Apr, 2023
基于掩码语言模型的文本对抗样本检测
Masked Language Model Based Textual Adversarial Example Detection
HTML
PDF
Xiaomei Zhang, Zhaoxi Zhang, Qi Zhong, Xufei Zheng, Yanjun Zhang...
TL;DR
提出了基于掩蔽语言模型的检测方法(MLMD),用于区分正常示例和对抗攻击示例,通过探索被掩蔽语言模型引起的流形变化产生明显可区分的信号,并且在各种基准文本数据集、机器学习模型和最先进的对抗攻击上都表现出强大的性能。
Abstract
adversarial attacks
are a serious threat to the reliable deployment of
machine learning
models in safety-critical applications. They can misguide current models to predict incorrectly by slightly modifying the in
→