医学视觉问答的自监督视觉语言预训练

Nov, 2022

医学视觉问答的自监督视觉语言预训练

Self-supervised vision-language pretraining for Medical visual question answering

Pengfei Li, Gang Liu, Lin Tan, Jinying Liao, Shenjun Zhong

TL;DR本文介绍了一种自我监督方法 - 对遮蔽图像建模、遮蔽语言建模、图像文本匹配和图像文本对齐进行对比学习的 M2I2 方法，应用于医学图像字幕数据集的预训练，并对下游医学 VQA 任务进行微调。该方法在三个公共医学 VQA 数据集上实现了最先进的性能。

Abstract

medical image visual question answering (VQA) is a task to answer clinical questions, given a radiographic image, which is a challenging problem that requires a model to integrate both vision and language informa