Hamish Flynn, David Reeb, Melih Kandemir, Jan Peters
TL;DR我们提出了一种改进的算法,可保证在最坏情况下减少后悔,以解决随机线性强盗问题。
Abstract
We present improved algorithms with worst-case regret guarantees for the
stochastic linear bandit problem. The widely used "optimism in the face of
uncertainty" principle reduces a stochastic bandit problem to th