BriefGPT.xyz
Jun, 2024
目标引导是否真能减少大型视觉语言模型的幻觉?
Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models?
HTML
PDF
Gregor Geigle, Radu Timofte, Goran Glavaš
TL;DR
在LVLM的开放式字幕生成中,细粒度对象定位目标对对象形象幻觉的效果很小或没有效果。
Abstract
large vision-language models
(LVLMs) have recently dramatically pushed the state of the art in
image captioning
and many image understanding tasks (e.g., visual question answering). LVLMs, however, often \textit{
→