Modern multi-layer perceptron (MLP) models have shown competitive results in learning visual representations without self-attention. However, existing MLP models are not good at capturing local details and lack prior knowledge of human configurations, which limits their modeling power for skeletal representation learning. To address these issues, we propose a simple yet effective graph-reinforced MLP-Like architecture, named GraphMLP, that combines MLPs and graph convolutional networks (GCNs) in a global-local-graphical unified architecture for 3D human pose estimation. GraphMLP incorporates the graph structure of human bodies into an MLP model to meet the domain-specific demand while also allowing for both local and global spatial interactions. Extensive experiments show that the proposed GraphMLP achieves state-of-the-art performance on two datasets, i.e., Human3.6M and MPI-INF-3DHP. Our source code and pretrained models will be publicly available.

提出了一种简单有效的图强化MLP-Like架构，名为GraphMLP，它将MLP和图卷积网络（GCN）结合在一起，用于3D人体姿态估计，并将人体的图结构融入到MLP模型中，允许局部和全局空间相互作用，实现了对视频和单帧中3D人体姿态估计的最新性能。

GraphMLP：用于3D 人体姿态估计的类图像多层感知机结构