This position paper proposes a novel approach to advancing NLP security by leveraging Large Language Models (LLMs) as engines for generating diverse adversarial attacks. Building upon recent work demonstrating LLMs' effectiveness in creating word-level adversarial examples, we argue for expanding this concept to encompass a broader range of attack types, including adversarial patches, universal perturbations, and targeted attacks. We posit that LLMs' sophisticated language understanding and generation capabilities can produce more effective, semantically coherent, and human-like adversarial examples across various domains and classifier architectures. This paradigm shift in adversarial NLP has far-reaching implications, potentially enhancing model robustness, uncovering new vulnerabilities, and driving innovation in defense mechanisms. By exploring this new frontier, we aim to contribute to the development of more secure, reliable, and trustworthy NLP systems for critical applications.

本论文旨在通过利用大语言模型（LLMs）生成多样化的对抗攻击，来解决自然语言处理(NLP)安全领域的不足。我们提出了一个新颖的方法，扩展了LLMs在生成词级对抗示例中的应用，涵盖了对抗补丁、通用扰动和目标攻击等多种攻击类型。研究发现，LLMs的语言理解与生成能力可以生成更有效的、语义连贯的人类样式的对抗示例，从而提升模型的鲁棒性，揭示新漏洞，并推动防御机制的创新。 

利用大语言模型作为对抗引擎推进自然语言处理安全