BriefGPT.xyz
Feb, 2024
2D和3D视觉问答之间的桥梁:一种用于3D VQA的融合方法
Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA
HTML
PDF
Wentao Mo, Yang Liu
TL;DR
利用问题条件的2D视图选择过程,将2D知识与3D-VQA系统进行整合,通过双Transformer结构紧密结合2D和3D模态,并捕捉模态之间的细粒度相关性,实现了面向3D-VQA的多模态基于Transformer的架构。
Abstract
In
3d visual question answering
(3D VQA), the scarcity of
fully annotated data
and limited visual content diversity hampers the generalization to novel scenes and 3D concepts (e.g., only around 800 scenes are uti
→