ICMLFeb, 2020


TL;DRBayesian Reward Extrapolation (Bayesian REX) is an efficient algorithm for high-dimensional imitation learning, which pre-trains a low-dimensional feature encoding and then leverages preferences over demonstrations to perform fast Bayesian inference. The algorithm achieves competitive performance with state-of-the-art methods and enables efficient high-confidence policy evaluation without having access to samples of the reward function.