BriefGPT.xyz
Nov, 2023
会话情景中的多模态注视跟踪
Multi-Modal Gaze Following in Conversational Scenarios
HTML
PDF
Yuqi Hou, Zhongqun Zhang, Nora Horanyi, Jaewon Moon, Yihua Cheng...
TL;DR
使用音频线索,本文在对话场景中提出了一种基于多模式的凝视追踪框架,利用音频与嘴唇之间的关联来增强场景图像并估计凝视候选者,采用多层感知机将主题与候选者进行匹配作为分类任务,通过引入图像和音频的对话数据集进行评估,表明我们的方法在凝视追踪任务中具有显著优势,并促进了多模式凝视追踪估计的更多研究。
Abstract
gaze following
estimates gaze targets of in-scene person by understanding human behavior and scene information. Existing methods usually analyze scene images for
gaze following
. However, compared with visual imag
→