Model compression and hardware acceleration are essential for the resource-efficient deployment of deep neural networks. Modern object detectors have highly interconnected convolutional layers with concatenations. In this work, we study how pruning can be applied to such architectures, exemplary for YOLOv7. We propose a method to handle concatenation layers, based on the connectivity graph of convolutional layers. By automating iterative sensitivity analysis, pruning, and subsequent model fine-tuning, we can significantly reduce model size both in terms of the number of parameters and FLOPs, while keeping comparable model accuracy. Finally, we deploy pruned models to FPGA and NVIDIA Jetson Xavier AGX. Pruned models demonstrate a 2x speedup for the convolutional layers in comparison to the unpruned counterparts and reach real-time capability with 14 FPS on FPGA. Our code is available at https://github.com/fzi-forschungszentrum-informatik/iterative-yolo-pruning.

基于模型压缩和硬件加速，本研究通过剪枝方法对高度互联的卷积层的连接汇如YOLOv7中进行处理，并通过迭代敏感度分析、剪枝和模型微调，显著减少了模型大小，同时保持了可比较的模型准确性。最终将剪枝模型部署到FPGA和NVIDIA Jetson Xavier AGX上，与未剪枝的模型相比，在卷积层中实现了2倍的加速，并在FPGA上达到了每秒14帧的实时能力。

基于连接的CNN架构的迭代滤波剪枝