BriefGPT.xyz
Nov, 2020
提高离线情境感知强化学习的分布鲁棒性
Improving Offline Contextual Bandits with Distributional Robustness
HTML
PDF
Otmane Sakhi, Louis Faury, Flavian Vasile
TL;DR
本文扩展了分布鲁棒优化方法,提出了 Counterfactual Risk Minimization 原则的凸重构方法,介绍了通过 DRO 框架构建离线情境强化学习的渐近置信区间,使用了已知的鲁棒估计渐进性结果自动校准置信区间,并呈现了初步实验结果支持我们方法的有效性。
Abstract
This paper extends the
distributionally robust optimization
(DRO) approach for
offline contextual bandits
. Specifically, we leverage this framework to introduce a convex reformulation of the
→