Top-down methods for monocular human mesh recovery have two stages: (1)
detect human bounding boxes; (2) treat each bounding box as an independent
single-human mesh recovery task. Unfortunately, the single-human assumption
does not hold in images with multi-human occlusion and crowding