Nov, 2023
视觉 - 语言模型能否以第一人称视角思考?
Can Vision-Language Models Think from a First-Person Perspective?
Sijie Cheng, Zhicheng Guo, Jingwen Wu, Kechen Fang, Peng Li...
TL;DRVision-language models have the potential to improve first-person perspective tasks, as demonstrated by the evaluation of eighteen popular models on the EgoThink benchmark, constructed with egocentric videos and annotated question-answer pairs. Increasing the number of trainable parameters has a significant impact on model performance, making EgoThink a valuable resource for advancing embodied artificial intelligence and robotics.