BriefGPT.xyz
Nov, 2022
医学视觉问答的自监督视觉语言预训练
Self-supervised vision-language pretraining for Medical visual question answering
HTML
PDF
Pengfei Li, Gang Liu, Lin Tan, Jinying Liao, Shenjun Zhong
TL;DR
本文介绍了一种自我监督方法 - 对遮蔽图像建模、遮蔽语言建模、图像文本匹配和图像文本对齐进行对比学习的 M2I2 方法,应用于医学图像字幕数据集的预训练,并对下游医学 VQA 任务进行微调。该方法在三个公共医学 VQA 数据集上实现了最先进的性能。
Abstract
medical image visual question answering
(VQA) is a task to answer clinical questions, given a
radiographic image
, which is a challenging problem that requires a model to integrate both vision and language informa
→