This paper is on human pose estimation using Convolutional Neural Networks. Our main contribution is a CNN cascaded architecture specifically designed for learning part relationships and spatial context, and robustly inferring pose even for the case of severe part occlusions. To this end, we propose a detection-followed-by-regression CNN cascade. The first part of our cascade outputs part detection heatmaps and the second part performs regression on these heatmaps. The benefits of the proposed architecture are multi-fold: It guides the network where to focus in the image and effectively encodes part constraints and context. More importantly, it can effectively cope with occlusions because part detection heatmaps for occluded parts provide low confidence scores which subsequently guide the regression part of our network to rely on contextual information in order to predict the location of these parts. Additionally, we show that the proposed cascade is flexible enough to readily allow the integration of various CNN architectures for both detection and regression, including recent ones based on residual learning. Finally, we illustrate that our cascade achieves top performance on the MPII and LSP data sets. Code can be downloaded from http://www.cs.nott.ac.uk/~psxab5/

本研究利用卷积神经网络进行人体姿态估计，通过利用关系和空间上下文，提出了一种特殊的CNN级联架构，并能够在部分遮挡的情况下，鲁棒地推断姿势，该级联架构能够指导网络集中精力在图像的哪个位置，并明确编码部分限制和上下文约束，并能够应对遮挡。我们的级联结构表现出色，能够在MPII和LSP数据集上取得最佳的表现。

基于卷积部位热图回归的人体姿态估计