制度转换赌徒

Jan, 2020

Regime Switching Bandits

Xiang Zhou, Ningyuan Chen, Xuefeng Gao, Yi Xiong

TL;DR本文介绍了一种多臂赌博机问题，其中奖励表现出制度切换，提出了一种在线学习算法，并对算法进行了性能检验和分析。

Abstract

We study a multi-armed bandit problem where the rewards exhibit regime-switching. Specifically, the distributions of the random rewards generated from all arms depend on a common underlying state modeled as a finite-state →