一张图像胜过16*16个单词：规模下的图像识别变形金刚

Oct, 2020

一张图像胜过16*16个单词：规模下的图像识别变形金刚

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai...

TL;DR本文研究使用Transformer代替CNN进行图像分类，实现在计算资源少的情况下，取得比目前卷积网络更好的识别结果，从而在计算机视觉上取得突破。

Abstract

While the transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision,