Leveraging large language models (LLMs) for complex natural language tasks typically requires long-form prompts to convey detailed requirements and information, which results in increased memory usage and inference costs. To mitigate these challenges, multiple efficient methods have been proposed, with prompt compression gaining significant research interest. This survey provides an overview of prompt compression techniques, categorized into hard prompt methods and soft prompt methods. First, the technical approaches of these methods are compared, followed by an exploration of various ways to understand their mechanisms, including the perspectives of attention optimization, Parameter-Efficient Fine-Tuning (PEFT), modality integration, and new synthetic language. We also examine the downstream adaptations of various prompt compression techniques. Finally, the limitations of current prompt compression methods are analyzed, and several future directions are outlined, such as optimizing the compression encoder, combining hard and soft prompts methods, and leveraging insights from multimodality.

本研究解决了大语言模型在复杂自然语言任务中对长格式提示的需求，从而导致的内存使用和推理成本上升的问题。通过对硬提示方法和软提示方法的比较，提出了一系列有效的提示压缩技术，并分析了它们的机制与适应性，为该领域的未来研究方向提供了重要见解。

大语言模型的提示压缩：综述