高斯赌博机的 Thompson 抽样策略的最优性取决于先验知识

Nov, 2013

高斯赌博机的 Thompson 抽样策略的最优性取决于先验知识

Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors

Junya Honda, Akimichi Takemura

TL;DR探讨多参数模型中 normal distribution 模型下 Thompson sampling 算法的优化问题及其 prior 选择的影响

Abstract

In stochastic bandit problems, a Bayesian policy called thompson sampling (TS) has recently attracted much attention for its excellent empirical performance. However, the theoretical analysis of this policy is di

stochastic bandit problems thompson sampling normal distributions asymptotic optimality priors

发现论文，激发创造

具有正态伽马先验的线性赌博机问题的汤普森取样算法

本文介绍了在正态分布奖励模型下使用贝叶斯推断方法的 Thompson 抽样算法在多臂赌博问题中的应用，通过使用多元正态分布 - 伽玛先验来表示所有相关参数的环境不确定性，并得出了关于 Thompson 抽样算法的贝叶斯遗憾界，其前提条件为方差分布的 5/2 阶矩存在。

Mar, 2023

一维指数族赌博机的汤普森抽样

使用 Jeffreys 先验推广 Thompson 抽样算法，证明其在指数族多臂赌博机中的渐近最优性，并提供后验分布的有限时间指数集中不等式，适用于一些重尾分布的情况。

Jul, 2013

Thompson 抽样的无先验和有先验依赖的遗憾界

研究具有奖励分布先验分布的随机多臂赌博问题，证明 Thompson Sampling 算法在没有先验分布时具有最优的贝叶斯遗憾上界，并在 Bubeck 等人的先验设置下证明了算法的一致界限，并与 Audibert 和 Bubeck [2009] 和 Russo 和 Roy [2013] 的技术方法有关。

Apr, 2013

智能和适应后验采样算法用于二元选择

我们研究了基于 Thompson Sampling 的有界奖励随机赌博算法。为了解决现有的与高斯先验的 Thompson Sampling 相关的问题相关后悔界限在 T≤288e^64 时是虚无的问题，我们导出了一个更实用的界限，将主要项的系数从 288e^64 缩小到 1270。此外，我们提出了两种参数化的 Thompson Sampling 算法：带有模型聚合的 TS-MA-α 和带有时间战斗的 TS-TD-α，其中 α∈[0,1] 控制效用与计算之间的权衡。这两种算法都可以实现 O (Kln^(α+1)(T)/Δ) 的后悔界限，其中 K 是臂数量，T 是有限学习时段，Δ 表示拉动次优臂时的单轮性能损失。

May, 2024

非静态贪婪问题的折扣汤普森采样

本文提出了一种新的算法 Discounted Thompson Sampling (DS-TS) with Gaussian priors，用于解决非平稳多臂赌博机问题，并分析了算法在不同情况下的表现和 upper bound of regret。

May, 2023

Thompson 采样的先验敏感性

本文深入分析了 Thompson Sampling 算法中对先验分布选择的鲁棒性，发现在选择优先概率质量时，其遗憾上限与先验正判度呈 O (√T/p), 先验负判度呈 O (√(1-p) T), 并利用这些性质提出了一种基于鞅理论的新证明方法。

Jun, 2015

使用分数后验信息的汤普森抽样泛化遗憾分析

Thompson sampling (TS) is a popular algorithm for solving multi-armed bandit problems; this paper introduces a variant called $\alpha$-TS with tempered likelihoods in the posterior distribution, and provides regret bounds for both instance-dependent and instance-independent scenarios.

Sep, 2023

组合半臂老虎机的汤普森抽样

本文研究了 Thompson 采样方法在随机组合多臂赌博机框架中的应用，分析了多种算法的累积遗憾，并给出了上限界以及其他算法之间的比较结果。

Mar, 2018

多组玩法随机多臂赌博问题中 Thompson 抽样的最优遗憾分析

本文提出了多次试验下的 Thompson sampling 方法（MP-TS）并对其进行了后效分析，证明了其具有与 Anantharam 等人提供的最佳后悔下界相匹配的最优后悔上界，并通过计算机模拟进行了验证。我们还提出了 MP-TS 的改进版本，并表明其具有更好的实际效果。

Jun, 2015

进一步优化 Thompson Sampling 算法的后悔上界

使用贝叶斯方法的随机算法 Thompson Sampling 在多臂赌博问题中表现显著，本文提供了一种新的悔恨分析方法，同时证明了该算法在期望后悔上的问题特定界限和问题独立界限，方法简单且可适用于更广泛的 contestual bandits 设置。

Sep, 2012