ACLFeb, 2023
教师干预:提高超低精度 Transformer 量化感知训练的收敛性
Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers
Minsoo Kim, Kyuhong Shim, Seongmin Park, Wonyong Sung, Jungwook Choi
TL;DR本论文提出了一种名为 Teacher Intervention(TI)的主动知识蒸馏方法,用于快速收敛超低精度预训练 Transformer 的 QAT,并采用逐步干预机制来稳定恢复 Transformer 层的子节,提高模型准确性。