BriefGPT.xyz
Sep, 2024
具有函数逼近的上下文赌博机的二阶界限
Second Order Bounds for Contextual Bandits with Function Approximation
HTML
PDF
Aldo Pacchiano
TL;DR
本研究解决了具有函数逼近的上下文赌博机中,乐观最小二乘法算法在奖励测量噪声变化情况下的后悔界限问题。我们首次提出了不再依赖时间范围平方根而是依赖测量方差总和的算法,显著提升了算法的适用性与效率。研究结果为上下文线性问题中导出二阶界限的方法提供了新的思路。
Abstract
Many works have developed algorithms no-regret algorithms for
Contextual Bandits
with
Function Approximation
, where the mean rewards over context-action pairs belongs to a function class. Although there are many
→