X-ViT: 高性能线性视觉Transformer无softmax

May, 2022

X-ViT: 高性能线性视觉Transformer无softmax

X-ViT: High Performance Linear Vision Transformer without Softmax

Jeonggeun Song, Heung-Chang Lee

TL;DR本文提出了一种名为X-ViT的视觉变换模型，其采用线性复杂度的自注意力机制代替了传统的二次复杂度算法，在图像分类和密集预测任务中表现优异。

Abstract

vision transformers have become one of the most important models for computer vision tasks. Although they outperform prior works, they require heavy computational resources on a scale that is quadratic to the number of tokens, $N$. This is a major drawback of the traditional self-atten