Developing agents that can quickly adapt their behavior to new tasks remains a challenge. meta-learning has been applied to this problem, but previous methods require either specifying a reward function which can be tedious or providing demonstrations which can be inefficient. In this