BriefGPT.xyz
Oct, 2014
cuDNN:深度学习高效基元
cuDNN: Efficient Primitives for Deep Learning
HTML
PDF
Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran...
TL;DR
创建了一个类似于BLAS的优化深度学习工作负载例程库,包含GPU的程序,易于集成到现有框架中,性能优化和内存使用率提高36%。
Abstract
We present a
library
that provides optimized implementations for
deep learning
primitives.
deep learning
workloads are computationally int
→