BriefGPT.xyz
Jun, 2024
视觉Transformer中的区域与稀疏注意力融合
Fusion of regional and sparse attention in Vision Transformers
HTML
PDF
Nabil Ibtehaz, Ning Yan, Masood Mortazavi, Daisuke Kihara
TL;DR
本研究提出了一种新的混合视觉transformer模型(ACC-ViT),运用区域关注和稀疏关注相结合的方式,动态地集成了局部和全局信息,同时保留了分层结构,并在常见的视觉任务中表现出色。
Abstract
Modern
vision transformers
leverage visually inspired local interaction between pixels through attention computed within window or grid regions, in contrast to the global attention employed in the original ViT.
regional
→