visual grounding is a ubiquitous building block in many vision-language tasks
and yet remains challenging due to large variations in visual and linguistic
features of grounding entities, strong context effect and the resulting
semantic ambiguities. Prior works typically focus on learni
3D 视觉定位是指在给定相应的文本描述时,自动定位指定对象的 3D 区域。现有的研究在识别相似对象时存在困难,特别是当描述中涉及多个相关对象时。本文提出了一种基于图网络和设计的记忆图注意力层的语义增强关系学习模型 SeCG,以加强不同模态之间的关系导向映射。实验证明,相比现有的最先进方法,本方法提高了多关系挑战的本地化性能。