video question answering aims at answering a question about the video content
by reasoning the alignment semantics within them. However, since relying
heavily on human instructions, i.e., annotations or priors, current contrastive
learning-based VideoQA methods remains challenging to p