多臂赌博机中的分布式协作决策：频率学派和贝叶斯算法

Jun, 2016

多臂赌博机中的分布式协作决策：频率学派和贝叶斯算法

Distributed Cooperative Decision-Making in Multiarmed Bandits: Frequentist and Bayesian Algorithms

Peter Landgren, Vaibhav Srivastava, Naomi Ehrich Leonard

TL;DR本研究使用频率学派和贝叶斯算法以及运行协商算法解决多智能体多臂赌博机问题中的探索和开发的分布式合作决策问题，并证明了这些算法的性能，以及通信图结构对决策性能的影响。

Abstract

We study distributed cooperative decision-making under the explore-exploit tradeoff in the multiarmed bandit (MAB) problem. We extend the state-of-the-art frequentist and bayesian algorithms for single-agent MAB