BriefGPT.xyz
Sep, 2021
缩放的ReLU对于训练视觉Transformer很重要
Scaled ReLU Matters for Training Vision Transformers
HTML
PDF
Pichao Wang, Xue Wang, Hao Luo, Jingkai Zhou, Zhipeng Zhou...
TL;DR
本论文研究了ViT模型的训练问题,发现scaled ReLU在conv-stem中不仅可以改善训练稳定性,还可以增加patch tokens的多样性,从而在不增加太多参数和flops的情况下显著提高性能,证明了ViT模型在训练得当的情况下是CNN模型的一个更好替代品。
Abstract
vision transformers
(ViTs) have been an alternative design paradigm to
convolutional neural networks
(CNNs). However, the training of ViTs is much harder than CNNs, as it is sensitive to the
→