weakly-supervised temporal action localization (WTAL) aims to recognize and
localize action instances with only video-level labels. Despite the significant
progress, existing methods suffer from severe performance degradation when
transferring to different distributions and thus may ha