BriefGPT.xyz
May, 2021
人类中心语言模型中的隐藏后门
Hidden Backdoors in Human-Centric Language Models
HTML
PDF
Shaofeng Li, Hui Liu, Tian Dong, Benjamin Zi Hao Zhao, Minhui Xue...
TL;DR
本文提出通过嵌入本质隐形且难以检测的触发器进行的NLP背门攻击,该攻击能超过多项NLP任务,例如有害评论检测,机器翻译和问答系统,能在维持正常使用的正常用户的同时,在不经意间实施高成功率的攻击。
Abstract
natural language processing
(NLP) systems have been proven to be vulnerable to
backdoor attacks
, whereby hidden features (backdoors) are trained into a
→