BriefGPT.xyz
Feb, 2014
在线随机优化在相关汇报反馈下的应用
Stochastic Optimization of a Locally Smooth Function under Correlated Bandit Feedback
HTML
PDF
Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill
TL;DR
本文介绍了一种高置信度树(HCT)算法,用于解决局部平滑函数下的在线随机优化问题,具有重要的实际应用价值,能够应用于强化学习的策略搜索问题,并且其具有处理相关奖励的复杂情况的能力。
Abstract
In this paper we consider the problem of
online stochastic optimization
of a locally smooth function under
bandit feedback
. We introduce the high confidence tree (HCT) algorithm, a novel any-time $\mathcal X$-arm
→