Jun, 2024

照亮阴影:用概念引导的视觉语言模型增强长尾实体引地

TL;DRMulti-Modal Knowledge Graphs (MMKGs) have proven valuable for various downstream tasks. To address the challenge of building large-scale MMKGs with mismatched images, this paper introduces COG, a framework that enhances vision-language models with concept guidance, effectively identifying image-text pairs of long-tailed entities and offering flexibility and explainability.