Xiaogeng Liu, Zhiyuan Yu, Yizhe Zhang, Ning Zhang, Chaowei Xiao
TL;DR自动梯度方法生成高效、通用的提示注入数据,彰显梯度测试的重要性,尤其是对于防御机制。
Abstract
large language models (LLMs) excel in processing and generating human language, powered by their ability to interpret and follow instructions. However, their capabilities can be exploited through prompt injection attack