BriefGPT.xyz
Nov, 2023
通过对抗性上下文学习劫持大型语言模型
Hijacking Large Language Models via Adversarial In-Context Learning
HTML
PDF
Yao Qiang, Xiangyu Zhou, Dongxiao Zhu
TL;DR
通过引入一种新的对上下文学习的颠覆性攻击方法,本文展示了一种能够利用LMLs来生成针对性响应的方法,并通过对各种任务和数据集的广泛实验结果证明了其有效性。
Abstract
in-context learning
(ICL) has emerged as a powerful paradigm leveraging
llms
for specific tasks by utilizing labeled examples as demonstrations in the precondition prompts. Despite its promising performance, ICL
→