Jun, 2024
照亮阴影:用概念引导的视觉语言模型增强长尾实体引地
Light Up the Shadows: Enhance Long-Tailed Entity Grounding with Concept-Guided Vision-Language Models
Yikai Zhang, Qianyu He, Xintao Wang, Siyu Yuan, Jiaqing Liang...
TL;DRMulti-Modal Knowledge Graphs (MMKGs) have proven valuable for various downstream tasks. To address the challenge of building large-scale MMKGs with mismatched images, this paper introduces COG, a framework that enhances vision-language models with concept guidance, effectively identifying image-text pairs of long-tailed entities and offering flexibility and explainability.