video summarization plays an important role in video understanding by selecting key frames/shots. Traditionally, it aims to find the most representative and diverse contents in a video as short summaries. Recently, a more generalized task, query-conditioned →