The increasing reliance on large language models (LLMs) such as ChatGPT in
various fields emphasizes the importance of ``prompt engineering,'' a
technology to improve the quality of model outputs. With companies investing
significantly in expert prompt engineers and educational resources rising to
meet market demand, designing high-quality prompts has become an intriguing
challenge. In this paper, we propose a novel attack against LLMs, named prompt
stealing attacks. Our proposed prompt stealing attack aims to steal these
well-designed prompts based on the generated answers. The prompt stealing
attack contains two primary modules: the parameter extractor and the prompt
reconstruction. The goal of the parameter extractor is to figure out the
properties of the original prompts. We first observe that most prompts fall
into one of three categories: direct prompt, role-based prompt, and in-context
prompt. Our parameter extractor first tries to distinguish the type of prompts
based on the generated answers. Then, it can further predict which role or how
many contexts are used based on the types of prompts. Following the parameter
extractor, the prompt reconstructor can be used to reconstruct the original
prompts based on the generated answers and the extracted features. The final
goal of the prompt reconstructor is to generate the reversed prompts, which are
similar to the original prompts. Our experimental results show the remarkable
performance of our proposed attacks. Our proposed attacks add a new dimension
to the study of prompt engineering and call for more attention to the security
issues on LLMs.

我们提出了一种名为 prompt stealing attacks 的新攻击，该攻击旨在基于生成的答案窃取设计良好的 prompt，通过参数提取器和提示重构器实现，实验结果表明攻击的卓越性能，进一步引发关于大型语言模型安全问题的关注。