Generating complex behaviors from goals specified by non-expert users is a crucial aspect of intelligent agents. interactive reward learning from trajectory comparisons is one way to allow non-expert users to convey complex objectives by expressing preferences over short clips of agent