Generating video frames that accurately predict future world states is
challenging. Existing approaches either fail to capture the full distribution
of outcomes, or yield blurry generations, or both. In this paper we introduce
an unsupervised video generation model that learns a prior