BriefGPT.xyz
Nov, 2024
语言模型词汇压缩用于低计算环境
LLM Vocabulary Compression for Low-Compute Environments
HTML
PDF
Sreeram Vennam, Anish Joishy, Ponnurangam Kumaraguru
TL;DR
本研究解决了在低计算环境中语言模型的内存消耗问题。通过基于字节对编码(BPE)合并的分组方法,压缩语言模型的最终线性层,成果显示在内存使用上减少了3.4倍,且在TinyStories数据集上的评估与GPT-Neo和GPT2表现相当,同时通过提升吞吐量满足低计算环境的需求。
Abstract
We present a method to compress the final linear layer of
Language Models
, reducing memory usage by up to 3.4x without significant performance loss. By grouping tokens based on
Byte Pair Encoding
(BPE) merges, we
→