Nov, 2023
仅加 $π$!用于理解日常生活活动的姿势引发视频 Transformer
Just Add $π$! Pose Induced Video Transformers for Understanding Activities of Daily Living
Dominick Reilly, Srijan Das
TL;DRPI-ViT is a Pose Induced Video Transformer that augments RGB representations learned by video transformers with 2D and 3D pose information, achieving state-of-the-art performance for Activities of Daily Living (ADL) recognition on real-world and large-scale RGB-D datasets without additional computational overhead at inference.