Nov, 2023

仅加 $π$!用于理解日常生活活动的姿势引发视频 Transformer

TL;DRPI-ViT is a Pose Induced Video Transformer that augments RGB representations learned by video transformers with 2D and 3D pose information, achieving state-of-the-art performance for Activities of Daily Living (ADL) recognition on real-world and large-scale RGB-D datasets without additional computational overhead at inference.