BriefGPT.xyz
May, 2019
赌博问题中的积极探索梯度上升
Gradient Ascent for Active Exploration in Bandit Problems
HTML
PDF
Pierre Ménard
TL;DR
基于梯度上升的新算法解决在固定置信度设置下的主动探索赌博机问题,采用在线惰性镜像上升的新采样规则,证明算法渐近最优和计算上高效。
Abstract
We present a new algorithm based on an
gradient ascent
for a general
active exploration
bandit problem
in the fixed confidence setting. Th
→