TL;DR本文介绍了一种基于自主数据生成技术以及提供高分辨率的 3D 物理模拟和物质和其描述的文本描述的数据集,旨在推动基于文本的视频 / 模拟实现高水平的物理真实感。
Abstract
Recent breakthroughs in Vision-Language (V&L) joint research have achieved
remarkable results in various text-driven tasks. High-quality text-to-video
(T2V), a task that has been long considered mission-impossible, was proven
feasible with reasonably good results in latest works. Howev