汤普森抽样在一般环境中是渐近最优的

Feb, 2016

Thompson Sampling is Asymptotically Optimal in General Environments

Jan Leike, Tor Lattimore, Laurent Orseau, Marcus Hutter

TL;DR本研究提出了一种Thompson抽样的变种，用于非参数强化学习中的一类计数的随机环境中，实现了学习环境类的效果，同时假设为可恢复情况下遗憾率是亚线性的。

Abstract

We discuss a variant of thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments