For many reinforcement learning (RL) applications, specifying a reward is
difficult. This paper considers an RL setting where the agent obtains
information about the reward only by querying an expert that can, for example,
evaluate individual states or provide binary preferences over t