BriefGPT.xyz
Jan, 2024
带阶段约束的情境强化学习
Contextual Bandits with Stage-wise Constraints
HTML
PDF
Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett
TL;DR
我们在上下文感知强化学习中研究了阶段限制的情况,并提出了一种上界置信区间算法来平衡探索和约束满足,同时证明了其遗憾界。
Abstract
We study
contextual bandits
in the presence of a
stage-wise constraint
(a constraint at each round), when the constraint must be satisfied both with high probability and in expectation. Obviously the setting wher
→