BriefGPT.xyz
May, 2025
R^3-VQA: 通过视频社交推理“读懂房间”
R^3-VQA: "Read the Room" by Video Social Reasoning
HTML
PDF
Lixing Niu, Jiapeng Li, Xingping Yu, Shu Wang, Ruining Feng...
TL;DR
本研究解决了现有社交推理任务和数据集复杂性不足的问题,提出了一种新的视频数据集R^3-VQA,包含精准细致的社交事件和心理状态注释,以及相应的社交因果链。重要发现表明,当前的大型视觉语言模型在复杂社交场景中的推理能力仍远不及人类,而心理理论提示可以提高其社交推理能力。
Abstract
"Read the room" is a significant
Social Reasoning
capability in human daily life. Humans can infer others'
Mental States
from subtle social cues. Previous
→