Chart question answering (CQA) is a crucial area of Visual Language Understanding. However, the robustness and consistency of current Visual Language Models (VLMs) in this field remain under-explored. This paper evaluates state-of-the-art VLMs on comprehensive datasets, developed specifically for this study, encompassing diverse question categories and chart formats. We investigate two key aspects: 1) the models' ability to handle varying levels of chart and question complexity, and 2) their robustness across different visual representations of the same underlying data. Our analysis reveals significant performance variations based on question and chart types, highlighting both strengths and weaknesses of current models. Additionally, we identify areas for improvement and propose future research directions to build more robust and reliable CQA systems. This study sheds light on the limitations of current models and paves the way for future advancements in the field.

图表问答是视觉语言理解的关键领域。本文评估了最先进的视觉语言模型在专门为本研究开发的全面数据集上的鲁棒性和一致性，包括多样化的问题类别和图表格式。我们研究了两个关键方面：1）模型处理不同级别的图表和问题复杂性的能力，2）模型在相同底层数据的不同视觉表示之间的鲁棒性。我们的分析揭示了基于问题和图表类型的显著性能差异，突显了当前模型的优势和劣势。此外，我们还确定了改进的方向，并提出了未来构建更强大可靠的图表问答系统的研究方向。本研究揭示了当前模型的局限性，并为未来领域的进步铺平了道路。

揭示真相：LLM真的懂图表吗？对一致性和健壮性的深入研究