提高大型语言模型的推理效率：研究优化策略与架构创新

Apr, 2024

提高大型语言模型的推理效率：研究优化策略与架构创新

Enhancing Inference Efficiency of Large Language Models: Investigating Optimization Strategies and Architectural Innovations

HTML

PDF

Georgy Tyukin

TL;DR通过跳过Transformer LLMs中后面的attention子层，可以有效地对大型语言模型进行压缩，提升性能并降低计算成本。在Llama 2 7B上观察到21%的生成速度提升，并出乎意料地改善了在多个常见基准测试中的性能。

Abstract

large language models are growing in size, and we expect them to continue to do so, as larger models train quicker. However, this increase in size will severely impact inference costs. Therefore model compression