Jinman Park, Kimathi Kaai, Saad Hossain, Norikatsu Sumi, Sirisha Rambhatla...
TL;DR本文介绍了一种利用过去帧信息运用自注意的 3D 人体姿态估计方法——Ego-STAN,通过引入 spatio-temporal Transformer 模型和 feature map tokens 实现对大规模训练的加速与计算效率的提高,并在实验中表现出卓越的性能。
Abstract
egocentric3d human pose estimation (HPE) from images is challenging due to severe self-occlusions and strong distortion introduced by the fish-eye view from the head mounted camera. Although existing works use i