To operate effectively in the real world, agents should be able to act from
high-dimensional raw sensory input such as images and achieve diverse goals
across long time-horizons. Current deep reinforcement and imitation learning
methods can learn directly from high-dimensional inputs b