BriefGPT.xyz
Jan, 2024
多轮多模态指称及锚定
ChatterBox: Multi-round Multimodal Referring and Grounding
HTML
PDF
Yunjie Tian, Tianren Ma, Lingxi Xie, Jihao Qiu, Xi Tang...
TL;DR
我们建立了一个名为多模态多轮指称及定位的新任务的基准,并提出了一个名为ChatterBox的视觉语言模型,通过协同处理视觉和语言任务,ChatterBox在多模态对话场景中具有复杂而精确的交互中相较于现有模型在实例级别的理解上表现出更好的性能。
Abstract
In this study, we establish a baseline for a new task named
multimodal multi-round referring and grounding
(MRG), opening up a promising direction for instance-level multimodal dialogues. We present a new
benchmark
→