reinforcement learning algorithms rely on carefully engineering environment
rewards that are extrinsic to the agent. However, annotating each environment
with hand-designed, dense rewards is not scalable, motivating the need for
developing reward functions that are intrinsic to the age