To contribute to automating the medical vision-language model, we propose a novel Chest-Xray Difference Visual Question Answering (VQA) task. Given a pair of main and reference images, this task attempts to answer several questions on both diseases and, more importantly, the differences between them. This is consistent with the radiologist's diagnosis practice that compares the current image with the reference before concluding the report. We collect a new dataset, namely MIMIC-Diff-VQA, including 700,703 QA pairs from 164,324 pairs of main and reference images. Compared to existing medical VQA datasets, our questions are tailored to the Assessment-Diagnosis-Intervention-Evaluation treatment procedure used by clinical professionals. Meanwhile, we also propose a novel expert knowledge-aware graph representation learning model to address this task. The proposed baseline model leverages expert knowledge such as anatomical structure prior, semantic, and spatial knowledge to construct a multi-relationship graph, representing the image differences between two images for the image difference VQA task. The dataset and code can be found at https://github.com/Holipori/MIMIC-Diff-VQA. We believe this work would further push forward the medical vision language model.

通过提出一种新的Chest-Xray差异视觉问答（VQA）任务，结合体积和语言模型的自动化，本研究旨在回答几个关于疾病和两者之间的差异的问题。该研究收集了一个新的数据集MIMIC-Diff-VQA，包含了164,324对主要和参考图像的700,703个问答对，与现有的医学VQA数据集相比，本研究中的问题是针对临床专业人员使用的评估-诊断-干预-评估治疗程序量身定制的。与此同时，我们还提出了一种新的专家知识感知图表示学习模型，以解决这个任务。该基准模型利用解剖结构先验，语义和空间知识等专家知识构建了一个多关系图，用于表示图像差异问答任务。数据集和代码可以在此URL找到，我们相信这项工作将进一步推动医学视觉语言模型的发展。

专家知识感知的图像差异表示学习用于差异感知医学视觉问答