visual model-based reinforcement learning (RL) has the potential to enable
sample-efficient robot learning from visual observations. Yet the current
approaches typically train a single model end-to-end for learning both visual
representations and dynamics, making it difficult to accura