BriefGPT.xyz
Jul, 2023
提示信息不应视为机密信息:系统性地测量提示信息提取攻击的成功率
Prompts Should not be Seen as Secrets: Systematically Measuring Prompt Extraction Attack Success
HTML
PDF
Yiming Zhang, Daphne Ippolito
TL;DR
本文介绍了一种用于测量和攻击大型语言模型中Prompt的框架,通过实验展示了文本攻击可以高概率地成功提取prompt。
Abstract
The generations of
large language models
are commonly controlled through
prompting techniques
, where a user's query to the model is prefixed with a prompt that aims to guide the model's behaviour on the query. Th
→