BriefGPT.xyz
Feb, 2024
LLaMA简化: 大型语言模型的简单深度修剪
Shortened LLaMA: A Simple Depth Pruning for Large Language Models
HTML
PDF
Bo-Kyeong Kim, Geonmin Kim, Tae-Ho Kim, Thibault Castells, Shinkook Choi...
TL;DR
通过深度剪枝方法,我们展示出其在零样本任务性能方面可以与最近的宽度剪枝方法竞争,并且在内存受限情况下的推理速度提升尤为明显,希望这项工作能够帮助在本地和边缘设备上部署大型语言模型。
Abstract
structured pruning
of modern
large language models
(LLMs) has emerged as a way of decreasing their high computational needs.
width pruning
→