BriefGPT.xyz
Jul, 2023
Patch n' Pack: NaViT,一种可适用于任何长宽比和分辨率的视觉Transformer
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
HTML
PDF
Mostafa Dehghani, Basil Mustafa, Josip Djolonga, Jonathan Heek, Matthias Minderer...
TL;DR
NaViT使用序列打包技术处理任意分辨率和长宽比的输入图像,可以应用于图像分类、目标检测和语义分割等任务,并且在鲁棒性和公平性基准测试中显示出良好的性能。
Abstract
The ubiquitous and demonstrably suboptimal choice of
resizing images
to a fixed resolution before processing them with
computer vision models
has not yet been successfully challenged. However, models such as the
→