Despite the impressive adaptability of large language models (LLMs), challenges remain in ensuring their security, transparency, and interpretability. Given their susceptibility to adversarial attacks, LLMs need to be defended with an evolving combination of adversarial training and guardrails. However, managing the implicit and heterogeneous knowledge for continuously assuring robustness is difficult. We introduce a novel approach for assurance of the adversarial robustness of LLMs based on formal argumentation. Using ontologies for formalization, we structure state-of-the-art attacks and defenses, facilitating the creation of a human-readable assurance case, and a machine-readable representation. We demonstrate its application with examples in English language and code translation tasks, and provide implications for theory and practice, by targeting engineers, data scientists, users, and auditors.

本研究解决了大型语言模型（LLMs）在安全性、透明性和可解释性方面的挑战，尤其是对抗攻击的脆弱性。通过引入基于 formal argumentation 的新方法，利用本体对攻击和防御进行正式化，从而创建易于人类理解的保障案例和机器可读表示。研究表明，这种方法在英语语言和代码翻译任务中的应用具有重要的理论和实践意义。

基于本体驱动的论证实现大型语言模型的对抗鲁棒性保障