Oct, 2023

快速多极注意力:一种长序列的分而治之注意机制

TL;DRTransformer-based models have achieved state-of-the-art performance, but the quadratic complexity of self-attention limits their applicability to long sequences; Fast Multipole Attention addresses this issue by reducing time and memory complexity, while maintaining a global receptive field with a hierarchical approach.