In recent years, a plethora of diverse methods have been proposed for 3D pose estimation. Among these, self-attention mechanisms and graph convolutions have both been proven to be effective and practical methods. Recognizing the strengths of those two techniques, we have developed a novel Semantic Graph Attention Network which can benefit from the ability of self-attention to capture global context, while also utilizing the graph convolutions to handle the local connectivity and structural constraints of the skeleton. We also design a Body Part Decoder that assists in extracting and refining the information related to specific segments of the body. Furthermore, our approach incorporates Distance Information, enhancing our model's capability to comprehend and accurately predict spatial relationships. Finally, we introduce a Geometry Loss who makes a critical constraint on the structural skeleton of the body, ensuring that the model's predictions adhere to the natural limits of human posture. The experimental results validate the effectiveness of our approach, demonstrating that every element within the system is essential for improving pose estimation outcomes. With comparison to state-of-the-art, the proposed work not only meets but exceeds the existing benchmarks.

我们开发了一种新颖的语义图注意力网络，它能够从自注意力捕获全局上下文的能力中受益，同时利用图卷积来处理骨架的局部连通性和结构约束。我们还设计了一个身体部分解码器，用于提取和改进与身体特定部分相关的信息。此外，我们的方法还包含距离信息，增强了模型理解和准确预测空间关系的能力。最后，我们引入了一种几何损失，对身体的结构骨架施加了关键约束，以确保模型的预测符合人体姿势的自然限制。实验证明了我们方法的有效性，表明系统中的每个元素对提高姿势估计结果至关重要。与最先进方法相比，我们的工作不仅达到了现有基准，而且超过了现有基准。

基于语义图注意网络和距离信息的3D全身姿态估计