BriefGPT.xyz
Apr, 2024
GPT-4V的进展:通过开源套件缩小与商业多模型的差距
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
HTML
PDF
Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao...
TL;DR
InternVL 1.5是一种开源的多模态大型语言模型,通过引入强大的视觉编码器、动态高分辨率和高质量双语数据集三个简单改进,提升了多模态理解的能力,在OCR和中文相关任务中达到与传统型和专有模型相竞争的性能。
Abstract
In this report, we introduce
internvl
1.5, an open-source multimodal
large language model
(MLLM) to bridge the capability gap between open-source and proprietary commercial models in
→