BriefGPT.xyz
Feb, 2024
带有延迟反馈的强化学习优化中的改进后悔度
Improved Regret for Bandit Convex Optimization with Delayed Feedback
HTML
PDF
Yuanyu Wan, Chang Yao, Mingli Song, Lijun Zhang
TL;DR
我们研究了具有延迟反馈的强凸波段优化问题,通过精细地利用延迟波段反馈的阻塞更新机制,我们的算法改进了损失边界并将其与延迟设置下的传统波段梯度下降(BGD)算法相匹配。
Abstract
We investigate
bandit convex optimization
(BCO) with
delayed feedback
, where only the loss value of the action is revealed under an arbitrary delay. Previous studies have established a
→