BriefGPT.xyz
Jun, 2023
使用层间依赖增强Hessian矩阵,用于混合精度后训练量化
Augmenting Hessians with Inter-Layer Dependencies for Mixed-Precision Post-Training Quantization
HTML
PDF
Clemens JS Schaefer, Navid Lambert-Shirzad, Xiaofan Zhang, Chiachen Chou, Tom Jablin...
TL;DR
提出了一种混合精度后训练量化(PTQ)方法,使用二阶信息和层间依赖关系指导双分搜索,以在用户可配置的模型准确度降低范围内找到量化配置。 该方法可以降低内存占用并提高延迟,同时保持模型准确性。
Abstract
Efficiently serving
neural network
models with low latency is becoming more challenging due to increasing model complexity and parameter count.
model quantization
offers a solution which simultaneously reduces me
→