In this work, we present a novel approach to multi-view action recognition
where we guide learned action representations to be separated from
view-relevant information in a video. When trying to classify action instances
captured from multiple viewpoints, there is a higher degree of di
Dichotomous Image Segmentation (DIS) explores the challenge of balancing semantic dispersion and high-precision details in object segmentation. The paper proposes a parsimonious multi-view aggregation network (MVANet) that surpasses state-of-the-art methods in accuracy and speed.