带附加信息的安全线性汤普森抽样

Nov, 2019

带附加信息的安全线性汤普森抽样

Safe Linear Thompson Sampling

Ahmadreza Moradipari, Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis

TL;DR本文针对线性随机赌博机问题提出一种基于线性Thompson抽样的新型安全算法，通过引入线性安全约束，在与没有安全约束的情况下，展示了使得机器人有更好的性能表现的结果，并将其与基于UCB算法的安全算法进行了比较。

Abstract

The design and performance analysis of bandit algorithms in the presence of stage-wise safety or reliability constraints has recently garnered significant interest. In this work, we consider the linear