AbstractThis paper addresses the problem of automatically localizing dominant objects as
spatio-temporal tubes in a noisy collection of videos with minimal or even no supervision. We formulate the problem as a combination of two complementary processes:
→