Mar, 2024
Griffon v2: 提升高分辨率缩放和视觉语言共识的多模态感知
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring
Yufei Zhan, Yousong Zhu, Hongyin Zhao, Fan Yang, Ming Tang...
TL;DRGriffon v2, a high-resolution generalist model, overcomes image resolution limitations in large vision language models to achieve nuanced visual and language referring, and outperforms expert models in object detection and counting.