时序差异变分自编码器

Jun, 2018

Temporal Difference Variational Auto-Encoder

Karol Gregor, Frederic Besse

TL;DR本文提出了TD-VAE这个生成式序列模型，它学习了几步之后状态的显式信念，可以直接展开而无需单步转换，用于复杂环境中的计划和行为的模拟。

Abstract

One motivation for learning generative models of environments is to use them as simulators for model-based reinforcement learning. Yet, it is intuitively clear that when time horizons are long, rolling out single step transitions is inefficient and often prohibitive. In this paper, we propose a generative model that learns state representations containing ex