BriefGPT.xyz
Jun, 2024
一张图像胜过16x16贴片:研究基于单个像素的Transformer模型
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
HTML
PDF
Duy-Kien Nguyen, Mahmoud Assran, Unnat Jain, Martin R. Oswald, Cees G. M. Snoek...
TL;DR
本研究发现在计算机视觉体系结构中的归纳偏置——局部性的必要性存在疑问,可以通过直接将每个像素视为标记并获得高性能结果来展示像素作为标记的有效性。
Abstract
This work does not introduce a new method. Instead, we present an interesting finding that questions the necessity of the
inductive bias
--
locality
in modern computer vision architectures. Concretely, we find th
→