基于梯度的神经网络芯片上每权重混合精度量化

May, 2024

基于梯度的神经网络芯片上每权重混合精度量化

Gradient-based Automatic Per-Weight Mixed Precision Quantization for Neural Networks On-Chip

Chang Sun, Thea K. Årrestad, Vladimir Loncar, Jennifer Ngadiuba, Maria Spiropulu

TL;DR通过高精度量化训练方法，减少模型大小和推理速度，提高 FPGA 部署的低延迟和低功耗神经网络的资源利用率，同时保持准确性。

Abstract

model size and inference speed at deployment time, are major challenges in many deep learning applications. A promising strategy to overcome these challenges is →