BriefGPT.xyz
Jan, 2025
高分辨率视觉-语言模型的高效架构
Efficient Architectures for High Resolution Vision-Language Models
HTML
PDF
Miguel Carvalho, Bruno Martins
TL;DR
本研究解决了现有视觉-语言模型在高分辨率图像中识别细节的准确性不足的问题。通过引入Pheye,这一新颖架构在训练更少参数的同时,能够高效处理高分辨率图像。研究发现,Pheye在细粒度图像理解和场景文本处理任务中表现优异,具有显著的效率和性能提升潜力。
Abstract
Vision-Language Models
(VLMs) have recently experienced significant advancements. However, challenges persist in the accurate recognition of fine details within
High Resolution
images, which limits performance in
→