Aug, 2021
使用条件变分自编码器从语音音频生成多样化手势
Audio2Gestures: Generating Diverse Gestures from Speech Audio with Conditional Variational Autoencoders
Jing Li, Di Kang, Wenjie Pei, Xuefei Zhe, Ying Zhang...
TL;DR通过 split cross-modal 潜变量为 shared 和 motion-specific 两部分,结合 mapping network, relaxed motion loss, bicycle constraint 和 diversity loss 技术来训练条件变分自编码器,从而更加真实和多样的生成语音到动作的映射。