BriefGPT.xyz
Jun, 2015
探索不再:非随机赌博机的改进高概率遗憾界限
Explore no more: improved high-probability regret bounds for non-stochastic bandits
HTML
PDF
Gergely Neu
TL;DR
本文提出了基于 Implicit eXploration 的损失估计策略,可以在不需要不必要的探索成分的情况下,实现高概率遗憾界,取得了多臂赌博问题方面的改进结果。
Abstract
This work addresses the problem of
regret minimization
in non-stochastic
multi-armed bandit
problems, focusing on
performance guarantees
t
→