This paper introduces a novel approach for generating GIFs called Synchronized Deep Recurrent Attentive Writer (Sync-DRAW). Sync-DRAW employs a Recurrent Variational Autoencoder (R-VAE) and an attention mechanism in a hierarchical manner to create a temporally dependent sequence of frames that are gradually formed over time. The attention mechanism in Sync-DRAW attends to each individual frame of the GIF in sychronization, while the R-VAE learns a latent distribution for the entire GIF at the global level. We studied the performance of our Sync-DRAW network on the Bouncing MNIST GIFs Dataset and also, the newly available TGIF dataset. Experiments have suggested that Sync-DRAW is efficient in learning the spatial and temporal information of the GIFs and generates frames where objects have high structural integrity. Moreover, we also demonstrate that Sync-DRAW can be extended to even generate GIFs automatically given just text captions.

本文介绍了一种名为“Sync-DRAW”的新方法，可用于生成视频，也可以实现文本到视频的转换。Sync-DRAW将变分自动编码器（VAE）与递归注意机制结合起来，在时间上形成一系列渐进式的视频帧，并在同步中对每个视频帧进行处理。我们的实验表明，Sync-DRAW可以高效地学习视频的空间和时间信息，并且可以从简单的标题中生成这些数据集的视频。

Sync-DRAW：使用深层循环注意架构自动生成视频