Recent advancements in natural language processing have highlighted the vulnerability of deep learning models to adversarial attacks. While various defence mechanisms have been proposed, there is a lack of comprehensive benchmarks that evaluate these defences across diverse datasets, models, and tasks. In this work, we address this gap by presenting an extensive benchmark for textual adversarial defence that significantly expands upon previous work. Our benchmark incorporates a wide range of datasets, evaluates state-of-the-art defence mechanisms, and extends the assessment to include critical tasks such as single-sentence classification, similarity and paraphrase identification, natural language inference, and commonsense reasoning. This work not only serves as a valuable resource for researchers and practitioners in the field of adversarial robustness but also identifies key areas for future research in textual adversarial defence. By establishing a new standard for benchmarking in this domain, we aim to accelerate progress towards more robust and reliable natural language processing systems.

本文针对自然语言处理领域深度学习模型面临的对抗攻击脆弱性提出了一个全面的基准，填补了现有研究中的评估空白。研究中提出的基准涵盖多种数据集，评估了先进的防御机制，并扩展到单句分类、相似度识别、自然语言推理等关键任务。该工作为研究人员及从业者提供了重要资源，并为文本对抗防御领域的未来研究指明了方向。

更强的文本，更智能的模型：提高对抗防御基准的标准