BriefGPT.xyz
Jun, 2023
Im2win:GPU 上的高效卷积操作
Im2win: An Efficient Convolution Paradigm on GPU
HTML
PDF
Shuai Lu, Jun Chu, Luanzheng Guo, Xu T. Liu
TL;DR
本文提出了基于im2win的卷积范式,旨在通过持续的内存访问提高性能,并经过了优化技术的改进,与其他基于cuBLAS和cuDNN的卷积实现相比,内存占用少23.1%至32.8%,性能提高了3.5倍至155倍。
Abstract
convolution
is the most time-consuming operation in deep neural network operations, so its
performance
is critical to the overall
performance
→