Reward functions are difficult to design and often hard to align with human intent. preference-based reinforcement learning (RL) algorithms address these problems by learning reward functions from human feedback. However, the majority of preference-based RL methods na\"ively combine su