Nov, 2023

视觉 - 语言模型能否以第一人称视角思考?

TL;DRVision-language models have the potential to improve first-person perspective tasks, as demonstrated by the evaluation of eighteen popular models on the EgoThink benchmark, constructed with egocentric videos and annotated question-answer pairs. Increasing the number of trainable parameters has a significant impact on model performance, making EgoThink a valuable resource for advancing embodied artificial intelligence and robotics.