TL;DR通过精心设计的注意力掩码,我们提出了强大的扰动技术 “HackAttend”,通过故意扰乱 SA 矩阵中的注意力分数,揭示了当前最先进的预训练语言模型在注意力微扰下的高度脆弱性,以及我们引入的新型平滑技术 “S-Attend” 在面对各种文本攻击时实现了与对抗训练相当的鲁棒性。
Abstract
pre-trained language models (PLMs) are shown to be vulnerable to minor word
changes, which poses a big threat to real-world systems. While previous studies
directly focus on manipulating word inputs, they are limited by their means of
generating adversarial samples, lacking generalizat