BriefGPT.xyz
May, 2018
PG-TS:逻辑上下文多臂赌博机的改进汤普森抽样
PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits
HTML
PDF
Bianca Dumitrascu, Karen Feng, Barbara E Engelhardt
TL;DR
本文提出了改进的Polya-Gamma配分的Thompson Sampling算法(PG-TS),通过使用一种快速推理程序,它可以解决逻辑上下文bandits的遗憾最小化问题,通过对环境特征协方差的后验分布的明确估计,能够使得PG-TS在类似情形下较传统算法快速收敛。
Abstract
We address the problem of
regret minimization
in
logistic contextual bandits
, where a learner decides among sequential actions or arms given their respective contexts to maximize binary rewards. Using a fast infe
→