BriefGPT.xyz
Sep, 2018
高性能零内存开销直接卷积
High Performance Zero-Memory Overhead Direct Convolutions
HTML
PDF
Jiyuan Zhang, Franz Franchetti, Tze Meng Low
TL;DR
本篇论文研究并证明了,当直接卷积实现正确时,消除了所有的内存开销,且效率在传统和嵌入式CPU架构上比现有的高性能卷积实现提高了10%到400%不等,并可以更好地扩展性能,即增加线程数时的性能下降更少。
Abstract
The computation of
convolution layers
in
deep neural networks
typically rely on high
performance
routines that trade space for time by usi
→