We propose a unified model for three inter-related tasks: 1) to
\textit{separate} individual sound sources from a mixed music audio, 2) to
\textit{transcribe} each sound source to MIDI notes, and 3) to\textit{
synthesize} new pieces based on the timbre of separated sources. The model is
inspired by the fact that when humans listen to music, our minds can not