TL;DR通过 Fine-Grained 人工智能反馈以及基于强化学习将多模态对齐,解决了 Large Vision-Language Models 中的幻觉问题,提高了模型的性能。
Abstract
large vision-language models (LVLMs) have demonstrated proficiency in
tackling a variety of visual-language tasks. However, current LVLMs suffer from
misalignment between text and image modalities which causes th