FPGA becomes a popular technology for implementing Convolutional Neural Network (CNN) in recent years. Most CNN applications on FPGA are domain-specific, e.g., detecting objects from specific categories, in which commonly-used CNN models pre-trained on general datasets may not be efficient enough. This paper presents TuRF, an end-to-end CNN acceleration framework to efficiently deploy domain-specific applications on FPGA by transfer learning that adapts pre-trained models to specific domains, replacing standard convolution layers with efficient convolution blocks, and applying layer fusion to enhance hardware design performance. We evaluate TuRF by deploying a pre-trained VGG-16 model for a domain-specific image recognition task onto a Stratix V FPGA. Results show that designs generated by TuRF achieve better performance than prior methods for the original VGG-16 and ResNet-50 models, while for the optimised VGG-16 model TuRF designs are more accurate and easier to process.

本文提出了TuRF框架，通过迁移学习将预训练模型适应于特定领域，替换普通卷积层并应用层融合来提高硬件设计性能，从而在FPGA上有效部署特定领域的应用，评估结果表明，与原始模型以及其他先前方法相比，TuRF可以更好地实现VGG-16模型的性能，同时更准确和易于处理。

面向FPGA的领域特定应用的高效卷积神经网络