赌博问题中的积极探索梯度上升

May, 2019

Gradient Ascent for Active Exploration in Bandit Problems

Pierre Ménard

TL;DR基于梯度上升的新算法解决在固定置信度设置下的主动探索赌博机问题，采用在线惰性镜像上升的新采样规则，证明算法渐近最优和计算上高效。

Abstract

We present a new algorithm based on an gradient ascent for a general active exploration bandit problem in the fixed confidence setting. Th