Chuang Gan, Deng Huang, Hang Zhao, Joshua B. Tenenbaum, Antonio Torralba
TL;DR本研究基于 Music Gesture 模型,提出了一种基于关键点的结构化表示法来建模音乐家演奏的身体和手指运动,并在视觉和音频结合分离任务中实现了较强的音频分离效果。
Abstract
Recent deep learning approaches have achieved impressive performance on
visual sound separation tasks. However, these approaches are mostly built on
appearance and optical flow like motion feature representations