BriefGPT.xyz
Oct, 2019
自适应平滑上下文强化学习
Smoothness-Adaptive Stochastic Bandits
HTML
PDF
Yonatan Gur, Ahmadreza Momeni, Stefan Wager
TL;DR
研究了具有随机协变量的非参数多臂赌博问题,考虑在不知道收益函数平滑度的情况下如何适应算法,并且提出了一种可以在决策过程中通过推断收益的平滑度以及利用现有策略的结构来实现平滑度自适应表现的算法,该算法在已知平滑度与未知平滑度的情况下都能够实现可接受的后悔率。
Abstract
We consider the problem of non-parametric multi-armed bandits with
stochastic covariates
, where a key factor in determining the complexity of the problem and in the design of effective policies is the
smoothness
→