sound source localization is a typical and challenging task that predicts the
location of sound sources in a video. Previous single-source methods mainly
used the audio-visual association as clues to localize sounding objects in each
image. Due to the mixed property of multiple sound s