We introduce the active audio-visual source separation problem, where an
agent must move intelligently in order to better isolate the sounds coming from
an object of interest in its environment. The agent hears multiple audio
sources simultaneously (e.g., a person speaking down the hal