提示信息不应视为机密信息：系统性地测量提示信息提取攻击的成功率

Jul, 2023

提示信息不应视为机密信息：系统性地测量提示信息提取攻击的成功率

Prompts Should not be Seen as Secrets: Systematically Measuring Prompt Extraction Attack Success

Yiming Zhang, Daphne Ippolito

TL;DR本文介绍了一种用于测量和攻击大型语言模型中Prompt的框架，通过实验展示了文本攻击可以高概率地成功提取prompt。

Abstract

The generations of large language models are commonly controlled through prompting techniques, where a user's query to the model is prefixed with a prompt that aims to guide the model's behaviour on the query. Th