TL;DR本论文主要介绍了一种基于反馈图的顺序学习问题,提出了一个名为 problem complexity 的新概念,并创建了一个既定算法,实现了对此设置的最小化失望度量的最优解。
Abstract
sequential learning with feedback graphs is a natural extension of the
multi-armed bandit problem where the problem is equipped with an underlying
graph structure that provides additional information - playing an