BriefGPT.xyz
Sep, 2023
像素与潜在扩散模型在文字到视频生成中的融合
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
HTML
PDF
David Junhao Zhang, Jay Zhangjie Wu, Jia-Wei Liu, Rui Zhao, Lingmin Ran...
TL;DR
本文提出了一个混合模型,名为 Show-1,结合了基于像素和基于潜变量的文本到视频扩散模型,以实现精确的文本-视频对齐和高质量视频生成。
Abstract
Significant advancements have been achieved in the realm of
large-scale pre-trained text-to-video diffusion models
(VDMs). However, previous methods either rely solely on
pixel-based vdms
, which come with high co
→