Given an "in-the-wild" video of a person, we reconstruct an animatable model of the person in the video. The output model can be rendered in any body pose to any camera view, via the learned controls, without explicit 3D mesh reconstruction. At the core of our method is a volumetric 3D human representation reconstructed with a deep network trained on input video, enabling novel pose/view synthesis. Our method is an advance over GAN-based image-to-image translation since it allows image synthesis for any pose and camera via the internal 3D representation, while at the same time it does not require a pre-rigged model or ground truth meshes for training, as in mesh-based learning. Experiments validate the design choices and yield results on synthetic data and on real videos of diverse people performing unconstrained activities (e.g. dancing or playing tennis). Finally, we demonstrate motion re-targeting and bullet-time rendering with the learned models.

通过视频重建一个可动画的模型，运用深度学习网络训练产生了一种体积式3D人体表达，实现了新颖的姿态/视角的综合和不需要预先装配模型的图像合成。研究证明了该模型的有效性并展示了不同人的视频实验结果以及模型的运用：运动重定向和子弹时间效果。

Vid2Actor: 野外视频中基于自由视点的动态人物合成