Long prompt leads to huge hardware costs when using Large Language Models (LLMs). Unfortunately, many tasks, such as summarization, inevitably introduce long task-inputs, and the wide application of in-context learning easily makes the prompt length explode. Inspired by the language understanding ability of LLMs, this paper proposes SelfCP, which uses the LLM \textbf{itself} to \textbf{C}ompress long \textbf{P}rompt into compact virtual tokens. SelfCP applies a general frozen LLM twice, first as an encoder to compress the prompt and then as a decoder to generate responses. Specifically, given a long prompt, we place special tokens within the lengthy segment for compression and signal the LLM to generate $k$ virtual tokens. Afterward, the virtual tokens concatenate with the uncompressed prompt and are fed into the same LLM to generate the response. In general, SelfCP facilitates the unconditional and conditional compression of prompts, fitting both standard tasks and those with specific objectives. Since the encoder and decoder are frozen, SelfCP only contains 17M trainable parameters and allows for convenient adaptation across various backbones. We implement SelfCP with two LLM backbones and evaluate it in both in- and out-domain tasks. Results show that the compressed virtual tokens can substitute $12 \times$ larger original prompts effectively

本文提出了 SelfCP，通过使用 Large Language Models （LLMs）自身来将长提示压缩为紧凑的虚拟标记，实现了无条件和有条件提示的压缩，适应标准任务和具有特定目标的任务。结果表明，压缩的虚拟标记可以有效地替代原始提示。

SelfCP: 使用冻结的大型语言模型将长提示压缩至1/12