BriefGPT.xyz
Jan, 2024
对大型多模态模型进行常见冲突的基准测试
Benchmarking Large Multimodal Models against Common Corruptions
HTML
PDF
Jiawei Zhang, Tianyu Pang, Chao Du, Yi Ren, Bo Li...
TL;DR
填补大型多模型(LMMs)评估中的不足,通过研究其输出在常见破坏情况下的自一致性,调查文本、图像和语音之间的跨模态交互,创建了一个综合性基准MMCbench,评估了100多个受欢迎的LMMs(共150个模型检查点),此全面评估对于实际部署至关重要,并有助于更好地了解顶尖LMMs的可靠性。
Abstract
This technical report aims to fill a deficiency in the assessment of
large multimodal models
(LMMs) by specifically examining the
self-consistency
of their outputs when subjected to
→