BriefGPT.xyz
Apr, 2024
MileBench: 在长上下文中评测多语言大型语言模型
MileBench: Benchmarking MLLMs in Long Context
HTML
PDF
Dingjie Song, Shunian Chen, Guiming Hardy Chen, Fei Yu, Xiang Wan...
TL;DR
该研究旨在通过引入MileBench基准来系统评估多模态大型语言模型(MLLMs)在长上下文和多图像任务中的适应能力,并发现开源MLLMs在长上下文情境中面临着挑战,尤其在涉及多图像的情景下。
Abstract
Despite the advancements and impressive performance of
multimodal large language models
(
mllms
) on benchmarks, their effectiveness in real-world,
→