BriefGPT.xyz
Mar, 2018
Doubling Tricks对多臂老虎机能做什么,不能做什么
What Doubling Tricks Can and Can't Do for Multi-Armed Bandits
HTML
PDF
Lilian Besson, Emilie Kaufmann
TL;DR
研究在线强化学习算法中的任何时间算法以及倍增技巧,为了在广泛的情境中证明几何变倍技巧可以用于保留某些遗憾边界,但无法保留分布依赖边界,而指数倍增技巧可能更好,因为它们保留了 R_T = O(log T)的边界,并且接近于保留 R_T = O(sqrt(T))的边界。
Abstract
An
online reinforcement learning
algorithm is anytime if it does not need to know in advance the horizon T of the experiment. A well-known technique to obtain an anytime algorithm from any non-anytime algorithm is the "
→