ICMLFeb, 2020
通过快速贝叶斯奖励推断从喜好中进行安全的模仿学习
Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences
Daniel S. Brown, Russell Coleman, Ravi Srinivasan, Scott Niekum
TL;DRBayesian Reward Extrapolation (Bayesian REX) is an efficient algorithm for high-dimensional imitation learning, which pre-trains a low-dimensional feature encoding and then leverages preferences over demonstrations to perform fast Bayesian inference. The algorithm achieves competitive performance with state-of-the-art methods and enables efficient high-confidence policy evaluation without having access to samples of the reward function.