BriefGPT.xyz
Sep, 2023
重要性加权的线下学习方法
Importance-Weighted Offline Learning Done Right
HTML
PDF
Germano Gabbianelli, Gergely Neu, Matteo Papini
TL;DR
离线策略优化, 随机情境赌博问题,重要性加权估计,隐性探索估计,PAC-Bayesian 策略类的改进
Abstract
We study the problem of
offline policy optimization
in
stochastic contextual bandit
problems, where the goal is to learn a near-optimal policy based on a dataset of decision data collected by a suboptimal behavio
→