BriefGPT.xyz
Aug, 2021
Vision Transformer和卷积神经网络的视觉感知相似吗?
Do Vision Transformers See Like Convolutional Neural Networks?
HTML
PDF
Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, Alexey Dosovitskiy
TL;DR
研究比较了卷积神经网络和Vision Transformer模型在图像分类任务中的内部表示结构,发现两种架构存在显著差异,其中self-attention在加快全局信息聚合方面发挥着关键作用。此外,预训练数据集规模会对中间特征和迁移学习产生影响。
Abstract
convolutional neural networks
(CNNs) have so far been the de-facto model for visual data. Recent work has shown that (Vision) Transformer models (ViT) can achieve comparable or even superior performance on
image classif
→