Hugo Garrido-Lestache Belinchon, Helina Mulugeta, Adam Haile
TL;DR使用序列到序列模型和 3D 向量量化可变自编码器来从视频生成音频,以改进与音频视觉媒体的交互,包括 CCTV 镜头分析、历史视频恢复和视频生成模型。
Abstract
Generating audio from a video's visual context has multiple practical
applications in improving how we interact with audio-visual media - for
example, enhancing cctv footage analysis, restoring historical videos