Learning a good 3D human pose representation is important for human pose related tasks, e.g. human 3D pose estimation and action recognition. Within all these problems, preserving the intrinsic pose information and adapting to view variations are two critical issues. In this work, we propose a novel Siamese denoising autoencoder to learn a 3D pose representation by disentangling the pose-dependent and view-dependent feature from the human skeleton data, in a fully unsupervised manner. These two disentangled features are utilized together as the representation of the 3D pose. To consider both the kinematic and geometric dependencies, a sequential bidirectional recursive network (SeBiReNet) is further proposed to model the human skeleton data. Extensive experiments demonstrate that the learned representation 1) preserves the intrinsic information of human pose, 2) shows good transferability across datasets and tasks. Notably, our approach achieves state-of-the-art performance on two inherently different tasks: pose denoising and unsupervised action recognition. Code and models are available at: \url{https://github.com/NIEQiang001/unsupervised-human-pose.git}

本文提出了一种全新的Siamese去噪自编码器来学习3D姿势表示，通过分离人类骨架数据的姿势和视图特征来进行全面无监督的学习，并借助序列双向递归网络（SeBiReNet）来考虑运动学和几何依赖性， 实现了在姿势去噪和无监督动作识别等问题上的最佳表现。

不监督3D人体姿态表示：景象和姿态分离