$α$-公平上下文强化学习

Oct, 2023

$α$-Fair Contextual Bandits

Siddhant Chaudhary, Abhishek Sinha

TL;DR设计了一种高效算法，确保在全信息和强盗反馈设置中几乎达到次线性的遗憾，以解决 alpha-fair contextual bandits 问题。

Abstract

contextual bandit algorithms are at the core of many applications, including recommender systems, clinical trials, and optimal portfolio selection. One of the most popular problems studied in the contextual bandit literature is to maximize the sum of the rewards in each round by ensuri