公平最优多智能体赌博机

Jun, 2023

Optimal Fair Multi-Agent Bandits

Amir Leshem

TL;DR研究多智能体多臂赌博学习问题，以无通信和有限奖励为前提，提出了一种分布式拍卖算法并进行样本最优匹配学习和新的拍卖决策策略，通过新颖的基于次序统计量的后悔分析带来了全新的性能，实验模拟表明性能依赖于对数时间。

Abstract

In this paper, we study the problem of fair multi-agent multi-arm bandit learning when agents do not communicate with each other, except collision information, provided to agents accessing the same arm simultaneously. We provide an algorithm with regret $O\left(N^3 \log N \log T \right)$ (assuming bounded rewards, with unknown bound). This significantly impr