BriefGPT.xyz
Oct, 2022
具有凹形回报的情境赌博机及其在公平排序中的应用
Contextual bandits with concave rewards, and an application to fair ranking
HTML
PDF
Virginie Do, Elvis Dohmatob, Matteo Pirotta, Alessandro Lazaric, Nicolas Usunier
TL;DR
本文研究了具有凹奖励的情境强化学习(CBCR)问题,提出了第一个不限政策空间并能使后悔可控的算法;通过把CBCR算法几何地解释为期望奖励的凸集上的优化算法,有了一种从CBCR后悔到标量奖励强化学习后悔的新方法, 并给出了在排名和公平性限制下CBCR的解法。
Abstract
We consider
contextual bandits
with Concave Rewards (CBCR), a
multi-objective bandit problem
where the desired trade-off between the rewards is defined by a known
→