Medical vision language pre-training (VLP) has emerged as a frontier of research, enabling zero-shot pathological recognition by comparing the query image with the textual descriptions for each disease. Due to the complex semantics of biomedical texts, current methods struggle to align medical images with key pathological findings in unstructured reports. This leads to the misalignment with the target disease's textual representation. In this paper, we introduce a novel VLP framework designed to dissect disease descriptions into their fundamental aspects, leveraging prior knowledge about the visual manifestations of pathologies. This is achieved by consulting a large language model and medical experts. Integrating a Transformer module, our approach aligns an input image with the diverse elements of a disease, generating aspect-centric image representations. By consolidating the matches from each aspect, we improve the compatibility between an image and its associated disease. Additionally, capitalizing on the aspect-oriented representations, we present a dual-head Transformer tailored to process known and unknown diseases, optimizing the comprehensive detection efficacy. Conducting experiments on seven downstream datasets, ours outperforms recent methods by up to 8.07% and 11.23% in AUC scores for seen and novel categories, respectively. Our code is released at \href{https://github.com/HieuPhan33/MAVL}{https://github.com/HieuPhan33/MAVL}.

通过咨询大型语言模型和医学专家，我们提出了一种新颖的VLP框架，将疾病描述分解为基本要素，利用对病理学可视表现的先前知识。通过整合Transformer模块，我们的方法将输入图像与疾病的多个要素进行对齐，生成以要素为中心的图像表示。通过整合每个要素的匹配，我们改善了图像与其相关疾病之间的兼容性。此外，我们还提出了一个面向要素的双头Transformer，用于处理已知和未知疾病，以优化综合检测效果。在七个数据集上进行实验证明，我们的方法在已见类别和新颖类别的AUC得分上分别超过最近的方法8.07%和11.23%。

增强病理检测的疾病描述分解：一种多方面的视觉语言匹配框架