In recent years increasingly complex architectures for deep convolution networks (DCNs) have been proposed to boost the performance on image recognition tasks. However, the gains in performance have come at a cost of substantial increase in compute resources, the model size and processing speed of the network for training and evaluation. Fixed point implementation of these networks has the potential to alleviate some of the burden of these additional complexities. In this paper, we propose a quantizer design for fixed point implementation for DCNs. We then formulate an optimization problem to identify optimal fixed point bit-width allocation across DCN layers. We perform experiments on a recently proposed DCN architecture for CIFAR-10 benchmark that generates test error of less than 7%. We evaluate the effectiveness of our proposed fixed point bit-width allocation for this DCN. Our experiments show that in comparison to equal bit-width settings, the fixed point DCNs with optimized bit width allocation offer >20% reduction in the model size without any loss in performance. We also demonstrate that fine tuning can further enhance the accuracy of fixed point DCNs beyond that of the original floating point model. In doing so, we report a new state-of-the-art fixed point performance of 6.78% error-rate on CIFAR-10 benchmark.

本研究提出了一种针对深度卷积网络的定点数实现的量化器设计，通过优化比特宽度分配，实现了在CIFAR-10基准测试上降低模型大小20%以上的优化，同时保持了原始浮点模型的准确性，该设计还能通过微调进一步提高模型的精度，从而实现在定点数性能方面的新的最高水平达到6.78％的误差率。

深度卷积网络的定点量化