The remarkable success of deep learning in various domains relies on the
availability of large-scale annotated datasets. However, obtaining annotations
is expensive and requires great effort, which is especially challenging for
videos. Moreover, the use of human-generated annotations leads to models with
biased learning and poor domain generalization and rob