ICMLFeb, 2020

通过快速贝叶斯奖励推断从喜好中进行安全的模仿学习

TL;DRBayesian Reward Extrapolation (Bayesian REX) is an efficient algorithm for high-dimensional imitation learning, which pre-trains a low-dimensional feature encoding and then leverages preferences over demonstrations to perform fast Bayesian inference. The algorithm achieves competitive performance with state-of-the-art methods and enables efficient high-confidence policy evaluation without having access to samples of the reward function.