ICMLJul, 2019
通过奖励偏置探索:针对随机多臂赌博机的奖励偏置最大似然估计
Exploration Through Reward Biasing: Reward-Biased Maximum Likelihood Estimation for Stochastic Multi-Armed Bandits
Xi Liu, Ping-Chun Hsieh, Anirban Bhattacharya, P. R. Kumar
TL;DRRBMLE 算法是一种针对随机多臂赌博机问题的学习算法,以奖励偏差最大似然估计法为基础,可以得到基于指数策略的解,同时它还能够适应性地估计未知参数,并在实验中表现优异。