Specifying reward functions for complex tasks like object manipulation or driving is challenging to do by hand. reward learning seeks to address this by learning a reward model using human feedback on selected query policies. This shifts the burden of reward specification to the optima