BriefGPT.xyz
May, 2024
SWAT:基于FPGA的可扩展和高效窗口注意力Transformer加速
SWAT: Scalable and Efficient Window Attention-based Transformers Acceleration on FPGAs
HTML
PDF
Zhenyu Bai, Pranav Dangi, Huize Li, Tulika Mitra
TL;DR
提出了一种基于FPGA的加速器设计,SWAT,通过最大程度地利用稀疏性实现可扩展的性能,相比基准FPGA加速器,在延迟和能效方面改进了22倍和5.7倍,并且比基于GPU的解决方案提高了15倍的能效。
Abstract
Efficiently supporting
long context length
is crucial for
transformer models
. The quadratic complexity of the self-attention computation plagues traditional Transformers. Sliding window-based
→