BriefGPT.xyz
Oct, 2024
VHELM:视觉语言模型的整体评估
VHELM: A Holistic Evaluation of Vision Language Models
HTML
PDF
Tony Lee, Haoqin Tu, Chi Heem Wong, Wenhao Zheng, Yiyang Zhou...
TL;DR
本研究解决了当前视觉语言模型(VLM)评估中对公平性、多语言能力和毒性等关键方面忽视的问题。通过扩展HELM框架,提出了VHELM方法,综合多种数据集,提供对VLM在视觉感知、知识、推理等9个方面的全面评估。研究发现,专注效率的模型在偏见基准测试中表现不佳,强调了模型评估标准化的重要性。
Abstract
Current benchmarks for assessing vision-language models (VLMs) often focus on their perception or problem-solving capabilities and neglect other critical aspects such as
Fairness
,
Multilinguality
, or toxicity. Fu
→