Large text-to-image diffusion models have exhibited impressive proficiency in
generating high-quality images. However, when applying these models to video
domain, ensuring temporal consistency across video frames
Text-to-video generation using FlowZero, a framework that combines Large Language Models (LLMs) with image diffusion models, achieves improvement in zero-shot video synthesis by generating coherent videos with vivid motion.