利用相邻相似性通过奖励样本转移提升多臂老虎机任务

Sep, 2024

利用相邻相似性通过奖励样本转移提升多臂老虎机任务

Exploiting Adjacent Similarity in Multi-Armed Bandit Tasks via Transfer of Reward Samples

NR Rahul, Vaibhav Katewa

TL;DR本文解决了序列多任务问题，关注具有相邻相似性的随机多臂老虎机。提出的两种基于UCB的算法通过转移前序任务的奖励样本来减少总体遗憾，实验结果表明在没有样本转移的情况下，转移样本可以显著提升性能。

Abstract

We consider a sequential multi-task problem, where each task is modeled as the stochastic Multi-Armed Bandit with K arms. We assume the bandit tasks are adjacently similar in the sense that the difference between the mean rewards of the arms for any two consecutive tasks is bounded by