Most previous work in music emotion recognition assumes a single or a few
song-level labels for the whole song. While it is known that different emotions
can vary in intensity within a song, annotated data for this setup is scarce
and difficult to obtain. In this work, we propose a met