TL;DR本文介绍了近期表现最佳的 Vision Transformers 方法,对其强弱项、计算成本、训练和测试数据集进行了全面综述,并在流行基准数据集上与各种 ViT 算法以及代表性 CNN 方法的性能进行了充分比较,最后讨论了一些局限性和提出了未来研究方向。
Abstract
vision transformers (ViTs) are becoming more popular and dominating technique for various vision tasks, compare to convolutional neural networks (CNNs). As a demanding technique in computer vision, ViTs have been