BriefGPT.xyz
Feb, 2024
高效上下文强化学习与不完全信息反馈图
Efficient Contextual Bandits with Uninformed Feedback Graphs
HTML
PDF
Mengxiao Zhang, Yuheng Zhang, Haipeng Luo, Paul Mineiro
TL;DR
通过在线回归将参数图学习与无信息判定相结合,该研究开发了第一个可用于无信息设置的情境算法,并证明使用对数损失可以获得有利的后悔保证。
Abstract
bandits with feedback graphs
are powerful
online learning
models that interpolate between the full information and classic bandit problems, capturing many real-life applications. A recent work by Zhang et al. (20
→