语言模型词汇压缩用于低计算环境

Nov, 2024

语言模型词汇压缩用于低计算环境

LLM Vocabulary Compression for Low-Compute Environments

Sreeram Vennam, Anish Joishy, Ponnurangam Kumaraguru

TL;DR本研究解决了在低计算环境中语言模型的内存消耗问题。通过基于字节对编码（BPE）合并的分组方法，压缩语言模型的最终线性层，成果显示在内存使用上减少了3.4倍，且在TinyStories数据集上的评估与GPT-Neo和GPT2表现相当，同时通过提升吞吐量满足低计算环境的需求。

Abstract

We present a method to compress the final linear layer of Language Models, reducing memory usage by up to 3.4x without significant performance loss. By grouping tokens based on Byte Pair Encoding (BPE) merges, we