AbstractVisual scenes are naturally organized in a hierarchy, where a coarse semantic is recursively comprised of several fine details. Exploring such a
visual hierarchy is crucial to recognize the complex relations of visual elements, leading to a comprehensive
→