The imperative to deploy Deep Neural Network (DNN) models on
resource-constrained edge devices, spurred by privacy concerns, has become
increasingly apparent. To facilitate the transition from cloud to edge
computing, this paper introduces a technique that effectively reduces the
memory footprint of DNNs, accommodating the limitations of resource-constrained
edge devices while preserving model accuracy. Our proposed technique, named
Post-Training Intra-Layer Multi-Precision Quantization (PTILMPQ), employs a
post-training quantization approach, eliminating the need for extensive
training data. By estimating the importance of layers and channels within the
network, the proposed method enables precise bit allocation throughout the
quantization process. Experimental results demonstrate that PTILMPQ offers a
promising solution for deploying DNNs on edge devices with restricted memory
resources. For instance, in the case of ResNet50, it achieves an accuracy of
74.57\% with a memory footprint of 9.5 MB, representing a 25.49\% reduction
compared to previous similar methods, with only a minor 1.08\% decrease in
accuracy.

为了在资源受限的边缘设备上部署深度神经网络模型并保护隐私，本文介绍了一种有效减小深度神经网络内存占用的技术，该技术名为后训练内层多精度量化（PTILMPQ），通过估计网络中层和通道的重要性，实现了在量化过程中的精确位分配。实验结果表明，PTILMPQ 为在内存资源受限的边缘设备上部署深度神经网络提供了有希望的解决方案，例如，在 ResNet50 的情况下，它以 9.5MB 的内存占用达到了 74.57% 的准确率，相比之前类似方法减小了 25.49%，仅有 1.08% 的准确率下降。