Specifying rewards for reinforcement learned (RL) agents is challenging.
Preference-based RL (PbRL) mitigates these challenges by inferring a reward
from feedback over sets of trajectories. However, the effectiveness of PbRL is
limited by the amount of feedback needed to reliably recover the structure of
the target reward. We present the PRIor Over Rewards (