BriefGPT.xyz
Nov, 2024
ChatRex:驯化多模态大语言模型以实现联合感知与理解
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
HTML
PDF
Qing Jiang, Gen Luo, Yuqin Yang, Yuda Xiong, Yihao Chen...
TL;DR
本研究针对现有多模态大语言模型在感知能力上的不足,提出了一种新的模型设计和数据开发方法。通过引入ChatRex,并构建全自动的数据引擎和Rexverse-2M数据集,实现了感知与理解的联合训练,显著提升了感知能力,同时保留了多模态理解性能,推动了多种应用的发展。
Abstract
perception
and
understanding
are two pillars of computer vision. While multimodal large language models (MLLM) have demonstrated remarkable visual
→