video question answering (VideoQA) is a task that requires a model to analyze
and understand both the visual content given by the input video and the textual
part given by the question, and the interaction between them in order to
produce a meaningful answer. In our work we focus on th