BriefGPT.xyz
Dec, 2023
有限资源下的偏好学习复杂性理解
Think Before You Duel: Understanding Complexities of Preference Learning under Constrained Resources
HTML
PDF
Rohan Deb, Aadirupa Saha
TL;DR
奖励最大化问题中,我们考虑资源消耗的限制下的对决强盗设置。我们提出了基于 EXP3 的对决算法,并通过数值模拟证明了我们提出方法的有效性。
Abstract
We consider the problem of
reward maximization
in the
dueling bandit
setup along with constraints on
resource consumption
. As in the class
→