AbstractScale is the primary factor for building a powerful foundation model that could well generalize to a variety of downstream tasks. However, it is still challenging to train
video foundation models with billions of parameters. This paper shows that
→