We present a method for learning intrinsic reward functions to drive the
learning of an agent during periods of practice in which extrinsic task rewards
are not available. During practice, the environment may differ from the one
available for training and evaluation with extrinsic rewa