BriefGPT.xyz
Dec, 2023
基于波动的自适应结构修剪大型语言模型
Fluctuation-based Adaptive Structured Pruning for Large Language Models
HTML
PDF
Yongqi An, Xu Zhao, Tao Yu, Ming Tang, Jinqiao Wang
TL;DR
通过提出一种新颖的名为FLAP(基于波动的自适应结构修剪)的网络学习模型无需再训练就能进行结构修剪的框架,可以有效减少存储和提高推理速度,大大优于现有的基于结构修剪的方法,同时通过制定结构重要性度量,自适应搜索全局压缩模型,并实施补偿机制来缓解性能损失。
Abstract
network pruning
is a promising way to address the huge computing resource demands of the deployment and inference of
large language models
(LLMs).
→