BriefGPT.xyz
Sep, 2023
图像掩蔽残差学习用于深度视觉Transformer的扩展
Masked Image Residual Learning for Scaling Deeper Vision Transformers
HTML
PDF
Guoxi Huang, Hongtao Fu, Adrian G. Bors
TL;DR
深度ViT在使用MIM进行预训练时暴露出深层退化问题,为了缓解深度ViT的训练困难,我们引入了一种自监督学习框架MIRL,该框架显著缓解了退化问题,使得ViT的深度扩展成为性能提升的有希望的方向。
Abstract
deeper vision transformers
(ViTs) are more challenging to train. We expose a degradation problem in deeper layers of ViT when using
masked image modeling
(MIM) for
→