BriefGPT.xyz
Feb, 2024
自然语言语义下的视觉Transformer
Vision Transformers with Natural Language Semantics
HTML
PDF
Young Kyung Kim, J. Matías Di Martino, Guillermo Sapiro
TL;DR
通过引入基于分割模型的新型分词器策略,语义视觉转换器(sViT)在捕获显著特征和全局依赖关系的同时,提高了解释性和鲁棒性,相较于传统视觉转换器模型(ViT)在训练数据需求、分布泛化和解释性方面表现得更优。
Abstract
Tokens or patches within
vision transformers
(ViT) lack essential
semantic information
, unlike their counterparts in natural language processing (NLP). Typically, ViT tokens are associated with rectangular image
→