BriefGPT.xyz
Jan, 2020
制度转换赌徒
Regime Switching Bandits
HTML
PDF
Xiang Zhou, Ningyuan Chen, Xuefeng Gao, Yi Xiong
TL;DR
本文介绍了一种多臂赌博机问题,其中奖励表现出制度切换,提出了一种在线学习算法,并对算法进行了性能检验和分析。
Abstract
We study a
multi-armed bandit
problem where the rewards exhibit regime-switching. Specifically, the distributions of the random rewards generated from all arms depend on a common underlying state modeled as a finite-state
→