BriefGPT.xyz
Jan, 2023
通过模型变异测试实现语言模型后门样本检测
BDMMT: Backdoor Sample Detection for Language Models through Model Mutation Testing
HTML
PDF
Jiali Wei, Ming Fan, Wenjing Jiao, Wuxia Jin, Ting Liu
TL;DR
本研究提出了一种基于深度模型突变测试的新型防御方法,可以在char-level,word-level,sentence-level以及style-level水平上检测恶意后门样本,并在三个基准数据集和三个样式转换数据集上表现出优异的效果。
Abstract
deep neural networks
(DNNs) and
natural language processing
(NLP) systems have developed rapidly and have been widely used in various real-world fields. However, they have been shown to be vulnerable to
→